When examining network traffic, one may examine the packets individually with Tcpdump, or reconstruct it with sophisticated and sometimes expensive tools. It is extremely useful to quickly examine traffic in different ways, obtaining a feel for the traffic’s makeup, and formulate an approach for in-depth analysis. No single tool can really accommodate this wish list, but the power and flexibility of the Bash shell and the GNU core utilities together with Tcpdump can provide this functionality. This combination of tools allows quick creation of statistics from the captured traffic as a whole, or for particular hosts or protocols. You can search the payload of packets for strings, call out particular fields in a protocol that might be in plain text, such as “User Agent:” in HTTP or logging facility in syslog traffic. Also, you can exclude traffic that is expected from a capture file, and then examine the traffic that is left over, allowing you to sift through and find things that may be unexpected. Finally, basic methods of managing a large pcap archive will be examined.
The elements that are going to be brought together here are pcap data, a tool to read that pcap data, Tcpdump, and the GNU coreutils. The Bash shell will bring them all together. The following sections cover the necessary ingredients in detail.
Security focused distributions such as Knoppix Security Tools Distribution (STD) and Backtrack have required tools built in. You may need to add Tcpdump to other distributions. The easiest way to add Tcpdump is using your Linux distribution’s package manager. Examples will us Ubuntu 12.04, which includes Tcpdump:
The Feature that brings these different tools together is called I/O redirection. All Bash shells support I/O redirection, a way of taking output from one program and using it as input to another program, which can then give output to another program, and so on. For the analysis method being described here, Tcpdump will read a binary capture of network traffic, a pcap file, and generate a stream of plaintext from it. Then that plaintext output is redirected into a combination of other programs to obtain the desired output. The other programs in this method manipulating the stream of text are the GNU coreutils. The Bash shell and GNU coreutils are packaged together as part of GNU/Linux, which you are likely using if you are using any popular Linux distribution. You may recognize many of these already because they make up the vast majority of everyday shell commands such as mkdir, chown, cut, wc, uniq, cat and ls. See the gnu.org manual page for coretuils for a complete listing (Free Software Foundation Inc., 2008).
At the core of this analysis method is pcap data. The pcap format was developed by the developers of Tcpdump at Lawrence Berkeley National Laboratory in 1987 (Ali, 2010), and because of this, many different network based programs can read and write in this format. You can use any of these programs to acquire pcap data, but the easiest way in this example is to use Tcpdump on our local computer to write directly to pcap files:
Another simple way is to use Wireshark, start a capture, then save the capture file. More advanced usage includes using the pcap log files from Snort alert logs, or performing a packet capture from a router or switch (Cisco, 2007). For testing purposes, pcap files are also available for download from several online repositories.
I/O redirection is accomplished on the command line using the pipe character (“|” is shift+\ on most US keyboards). A few basic examples of combining programs using pipes to generate relevant statistics are demonstrated, and then more advanced scenarios. Readers are encouraged to their own variations. Indulging one’s curiosity and playing with the way these commands fit together is an excellent way to find new and interesting ways to view traffic. You never know what useful things you will discover.
To start, count the number of packets in a particular pcap file with the following command:
The program named wc takes the output of Tcpdump as input and counted the lines, and there are 12365688 of them. Another program, cut, can separate a particular part of output before piping it into another program. First, look at a sample of output from Tcpdump without modification:
The basic structure of Tcpdump output is: [timestamp] [network protocol] [source IP].[source port] > [dest IP].[dest port] From this output sample, you can find the field you want to focus on, and then count how many columns from the left that desired field is. It is important to note what sort of character separates the columns, in this example it is a space. Note that we tell the cut program what character by using the –d [delimiter] flag. Also of note is the use of the command head, which shows only the first 10 lines of output. Using head not only keeps the output from blasting your screen, it will also limit execution output. Commands that might take quite a while to complete will be limited to specified output parameters. This allows an investigator to try each step quickly without processing the entire file. While in a real investigation, one would process the entire pcap file, this eliminates it from the command line. For an example, select only the source IP address with port, which is the third column, with the following command:
To filter to just TCP/IP traffic and exclude layer 2 traffic, add the Tcpdump filter of ‘tcp or udp’:
One more step is to use only the IP address and remove the source port by adding another cut that selects the first 4 columns separated by the “.” character:
Uniq, also part of the GNU coreutils family, is used to eliminate adjacent, duplicate lines of text. Note the requirement of “adjacent”, which means you need to sort any input that goes into uniq, so that it will know if they are unique or just appear in different parts of the pcap file. Continuing from the last example of source IP addresses:
This example command line found all source IP addresses in the pcap file and listed them individually. You can change the cut statements to find out other information. For example, if you wanted to see the destination IP addresses rather than sources, you’d change the first cut statement to cut –f 5, so that it is selecting the 5th column instead.
A very useful feature of uniq is that it will also count the number of instances it found, so you can use it to see the top number of X. For example, the top 10 destination IP addresses:
Note that you sort the list a second time with the “-‐nr” flag to show the output sorted in descending order by numerical value. To examine destination ports, start by selecting only destination IPs and ports for new TCP sessions using a Tcpdump filter of ‘tcp=2’ which selects only packets with the SYN flag set. That way you don’t accidentally give undue weight to commonly used ports like 443 and 80, where there may be a large number of packets over very few sessions as in the case of a HTTP or HTTPS download:
Now you’ve got the top destinations, you can use cut to select only the port:
This example shows port 8118 as a top destination, and from that it can be inferred that we have a user utilizing a Privoxy server. The appearance of port 6881 shows that the Bittorrent protocol is in use. The SANS Internet Storm Center Port Details (SANS ISC, 2012) site is a great resource for determining common port numbers and corresponding protocols and programs. In this site, one may find well known port information, CVEs by port, and number of attacks for each port by source and destination. We can do the same with source IP addresses and see who the top talkers are:
You can do the same procedure to see our top source ports, but often they are random or sequential, and of marginal utility.
Many network protocols store their data as plain text in the payload portion of a packet (SMTP, Syslog, POP3, FTP ASCII mode, HTTP, DNS, etc), and Tcpdump can display this text by using the -‐A switch:
Pipe the ASCII output of Tcpdump into grep and then use grep to look for a particular text string. If you want to make a rudimentary IDS, use a regular expression with grep. If you have found a host making a HTTP request for a malicious file you can use “grep -‐i” with the file name to see if other hosts have requested the same file. This will be covered more in-‐depth in the “Examining HTTP” section.
Another useful technique is analyzing the abnormal traffic. Normal traffic is defined as being created through use of widely available software being used for its intended purpose. For every network protocol there is a Request For Comment (RFC) produced by the IETF that defines how network applications should work (ITEF, 2012), and most commercial software honors these definitions. Abnormal traffic would then be considered network traffic that does not comply with RFC or complies in a way designed to gain some other outcome. Removing normal and expected network traffic, then examining the remaining traffic is a powerful technique to find things that cannot be anticipated.
Country code top level domain names (ccTLD) are more commonly abused inside the USA (Kadam, 2012) and one’s knowledge of this abuse can be used to the investigator’s advantage. Search the captured traffic for DNS and then cut, sort, and uniq the results to find the most frequently resolved:
Then add a grep statement to exclude your more commonplace TLDs:
Finally, filter for names only, not IP addresses:
After you have found some suspect names you’d like to investigate, simply go back through the capture file using Tcpdump piped to grep, which you’ll use to search for your suspect domain name as outlined previously.
This formula also applies to the HTTP protocol. First, remove the most common and “safe” methods (Fielding, 1999) as defined by the W3C. Searching for “normal” HTTP methods:
Removing GET and HEAD methods:
Another field of HTTP that is of interest is the “referer” field. Please note that referrer is misspelled in the HTTP protocol (Fielding, 1999):
If you found a malicious URL, you could examine the referrer field to see what other request had caused the malicious request
Malware authors may be using unique, unusual, or malformed “User-Agent” string when making HTTP GET requests. (Manners, 2011). To see all the user-agent fields in our capture file use the following command:
There are many ways to analyze “User-‐agent:” and spending one’s time doing so is useful to the investigator (). In the previous example, we can see that there is a variety of User-‐Agents at work on our network, torrent software, skype, Windows Media Player and others. Several websites catalog user agent strings and are can be found via web search. On our output, one User-‐Agent stands out as somewhat irregular; “GBXHTTP”. At the time of writing, this was not listed in any User-‐Agent string database, but further analysis reveals it is the GearBox Software downloader:
The “Host:” field shows us we probably talking with a software company’s content distribution network. Using the “-‐-‐context=5” flag for grep, we can look at a bit more of the HTTP GET:
Now we know the IP address, and a reverse lookup of the IP confirms the theory it is a content distribution network, as Cloudfront is the name of Amazon.com’s CDN service:
When you want to capture network traffic over a long period of time, this author highly recommends creating a pcap repository or archive. Before Tcpdump 3.7.1 there was no automatic way to change and manage the logs. However, after 3.7.1, the authors of Tcpdump have made things very easy on us by providing the –C [MBytes] and –W [number of files] switches.
Published with the express permission of the author.