October 29, 2010

Weekend Project: Analyze Your Network with Wireshark

Wireshark is an open source network packet analyzer. Without any special hardware or reconfiguration, it can capture live data going in and out over any of your box's network interfaces: Ethernet, WiFi, PPP, loopback, even USB. Typically it's used as a forensics tool for troubleshooting network problems like congestion, high latency, or protocol errors — but you don't want to wait until your network is in trouble to learn how to use it. This weekend, why not take a look at your network traffic, and learn how to use Wireshark to your advantage?

Wireshark is a GTK+ application, although the project also includes a console-based front end named TShark that features most of the functionality found in the GUI version. Considering its reputation as a useful administration tool, you will probably find it in your distribution's package repositories. If not, you can download packages for several distributions on wireshark.org, along with the source. The current release is 1.4.1. Mac OS X and Windows binaries are available as well, which you may need to analyze machines running those operating systems (more on that later).

Because it needs to switch the network interface into "promiscuous mode" in order to capture all network traffic, Wireshark must be run as root. The libpcap library performs the actual packet capture, and supports a large-but-not-infinite range of network devices. You should check the compatibility matrix on the project wiki if you are using a peculiar network type — almost all Ethernet and WiFi cards in common usage will work without incident.

Traffic Capture

You can start a new network capture session from the "Capture" menu; Capture -> Interfaces brings up a dialog box showing all of the interfaces Wireshark has detected, plus the pseudo-device "any" that captures from all of the above. The Capture -> Options choice allows you to specify several options before you begin, including limiting your capture with filtering rules (such as by particular protocols or IP address only), automatically stopping the capture after a specified amount of time, or splitting the file automatically into separate time- or size-dictated files.

Whenever you start your capture, the packets are logged on-screen in a table showing basic information (source and destination, protocol, time, etc.) in column headers. Wireshark color-codes the entries for your convenience by flagging "interesting" packets, such as TCP retransmissions, with different text and background colors.

How long you let your capture run depends on what you need to study. Several hours might be required to catch a hard-to-reproduce problem with an Internet service, but a few minutes' worth will suffice just to familiarize yourself with the tools. After you stop the capture, you can select any packet for further inspection by clicking on it in the log window. The details are displayed in a tree-like sub-window that breaks the packet down by network layer. If you are having Ethernet trouble, you can look into the Ethernet frames; if it is an HTTP problem, you can dig down at that level instead.

You should always save captured data that you need for forensic or profiling purposes. Wireshark uses the .pcap file extension. Do be aware, however, that capture files can get quite large; if you are only interested in a portion of your overall network traffic, you can use Wireshark's filter mechanism — located directly above the main capture table to winnow down the data set before you save it to disk.

Examining the Data

The filter tool is the most basic way to hone your captured data into useful format. Click the "Filter" button itself to bring up a selection box with several common options: TCP only, UDP only, everything not to the local IP address, everything-except-DNS-and-ARP, and so forth. Clicking on any of the options in the list will show you the exact filter string in Wireshark's filter syntax, which is a useful way to learn to write your own filter expressions. For example, the "Non-HTTP and non-SMTP to/from" filter is designed to filter out uninteresting traffic; its syntax is not (tcp.port == 80) and not (tcp.port == 25) and ip.addr == Hit "Apply" and Wireshark will filter your captured data on-screen. You can write your own filter expressions by clicking on the Expression button; Wireshark provides a handy selector widget listing the known fields and logical operators you can use.

The Analyze menu contains some more complex pre-defined filtering options. "Enabled Protocols" gives you a way to selectively enable or disable higher-level, application protocols, so that you can filer out instant messaging traffic, examine only certain types of protocol messages, and other options that would be prohibitively long to express in the filter syntax. "User Specified Decodes" allows you disable decoding of particular protocols, which could be helpful in diagnosing a particular application. "Follow TCP Stream" allows you to select one particular TCP connection, and trace its progress from start to finish; similar options are available for UDP and SSL conversations. Finally, the "Expert Infos" option extracts error messages and warning flags (such as lost or out-of-order segments) to quickly identify problems.

The Statistics menu can give you an even quicker overview of the entire data set. It contains pre-set functions to analyze common network metrics and presents them in useful tables. If you are examining your network traffic for the first time, these can be helpful tools to understand its normal behavior. You can examine the distribution of packet sizes, traffic by link- and application-layer protocol used, and response time.

Wireshark can also produce several graphs on the fly that may help you visualize your traffic better. With the IO Graphs tool in the Statistics menu, for example, you can select up to five filters to compare head-to-head in different colors.

Beginning Forensics

As mentioned in the introduction, profile normal network traffic is not the goal of Wireshark — it is just a tool to help you recognize aberrant behavior when you are trying to track down the source of a problem. Unfortunately, there is no quick-and-easy path to tracing down the root cause of high latency or slow throughput.

Sure, if there is a zombie machine on your network infected by a trojan, you may easily flag it as an infected spam bot when you see it initiate thousands of SMTP connections per hour — and detecting viruses and malware is an important forensic task. But tracking down why one of your database servers is always a little bit slower than the other could involve quite a bit more digging and analysis.

That's why taking some time to profile your network traffic under what we'll assume are normal operating conditions is valuable. You can get a feel for how often WiFi clients see TCP retransmissions; if the rate doubles or triples, and you have not added any more machines, then you may need to look to see whether any one client is behaving differently than the others, or if you are simply seeing signal degradation. When diagnosing bandwidth woes, if one of your local servers is consistently timing out and dropping connections, it could be an application-level problem. But if Wireshark's logs reveal that it is the remote server resetting connections, then you need to make a phone call.

Here, the tutorials at the Wireshark site are an invaluable aid. The wiki has some basic network troubleshooting pages, as well as links to resources hosted elsewhere. So, too, do several of the other open source network analysis and security projects, like Nagios, EtherApe, NMap, and tcpdump. Much of forensics depends on understanding the TCP/IP protocol stack and common problems, so a good book or two on the subject goes a long way.

Wireshark includes a lot of features that will help you analyze your network when you are tracking down the source of your problem. For example, you can run statistical comparisons between two saved packet captures; this allows you to perform a capture when you are experiencing the problem, and compare it against a data set you collect as a control group when things are running smooth. Likewise, you can collect and compare captures from two different machines — say, on different network segments or with different configurations. This is also why it is so helpful that there are builds of Wireshark available for the proprietary operating systems: when chasing down a performance problem, you may need to collect data from every source.

Extra Credit: Visualization, Alternative Captures

Although Wireshark's filtering and analysis tools can expose numerous facets of your traffic capture in the GUI, the interface does have its limitations. There are always times when a graph makes the answer leap right off of the screen in a way that a table cannot. There are expensive proprietary tools sold to add custom visualization and data mining features to the Wireshark workflow, but you don't need those.

Wireshark can export captures in generic CSV form, which you can then pull in to other application to work with. That could mean a simple spreadsheet like Gnumeric or OpenOffice.org, or a statistical package like R or gnuplot. A good place to start is the list of open source analysis packages at forensicswiki.org. The landscape is always changing, though. Just this fall, the popular data analysis engine Freebase Gridworks was turned into an open source project called Google Refine, which could make visualizing network traffic considerably easier.

Last but not least, although Wireshark is always referred to as a network analyzer, the truth is it can analyze other things as well, including USB traffic and even Unix socket connections between applications. So even if you master your TCP/IP traffic this weekend, you may still have room to explore.

Click Here!