As your company's network manager, you have become concerned about how slow your Internet connection seems to be in the afternoons. You have a fixed high-speed connection, so you suspect the problem is that too many employees are downloading music and video files, overloading your link. To test your theory, you have already hooked up a Linux box to your Internet connection, running the "tcpdump" program to monitor all network traffic. That program produced several huge text files, each with a detailed log of every network packet that passed between your company and the rest of the Internet during each tcpdump run.
Now your problem is to make sense of the logs, because looking at files with tens of thousands of records each makes your head hurt. Your task for this problem is to write a program that will read the information in the log files, and then print out a short, useful summary report. That report will be a list of the top ten Internet addresses, ranked by total network traffic, and a list of the top ten internal network users, again ranked by network traffic.
The input file contains several tcpdump logs, each headed by a one-line
description-line describing the tcpdump log, followed by several
tcpdump-records. The file ends with a single line containing just the word
Every line begins in column 1.
Each description-line is in the format shown on the following line:
TCP log from start-date-time to end-date-time.
The start-date-time and end-date-time are each in the format
YYYY is a four digit year,
MM is a two digit month number,
DD is a two digit day,
HH is a two digit hour (in 24-hour clock format),
is a two digit minute, and
SS is a two digit second. For
example, 10:30:15 AM on January 5, 2001 would be represented as
2001‑01‑05:10:30:15, and 10:30:15 PM on that day would be
Each tcpdump-record is in a format shown in either of the following lines:
Each tcpdump-record is a single line, no longer than 255 characters long, and terminated by an end-of-line marker. The fields timestamp, protocol, interface, size, and between are all sequences of characters that do not contain end-of-line markers or semi-colons. If there is an optional field, it may contain any character other than an end-of-line marker. The only fields you need to use are size and between.
The size field contains an integer in the range of 1 to 32767,
followed by a single space, followed by the string "
including the quotation marks.) For example, if the packet is 412 bytes
long, the size field will be "
412 bytes". Even if the
field is only 1 byte long, it will look like "
The between field is in the form "
to dest:port". The source and dest fields are
each distinct and valid IP addresses. An IP address consists of four integers in
the range from 0 to 255 separated by periods (e.g., 127.12.255.0). The
port field is an integer in the range from 0 to 65535.
Each time a new description-line is read, a new, independent log is started.
As your program reads tcp-dump records, it will keep track of the total number of bytes that passed to or from each IP address. It doesn't matter if an address is the source or dest; if it appears in a tcpdump-record, the value of the size field is added to the total being tracked for each of the two addresses.
After determining all the totals for all IP addresses, your program will determine the 10 external IP addresses that have the greatest number of bytes passed to or from them, and the 10 internal IP addresses that have the greatest number of bytes passed to or from them, and print those two "top 10" lists, suitably labeled and formatted. Each list should be printed in descending order by number of bytes.
An internal IP address is any address assigned to your company's own network.
Those addresses are recognized by their prefixes. Any address in any of
the following formats is internal:
x is any integer in the range from 0 to 255. An external IP address is
any IP address that is not an internal IP address.
You may assume that there are at least 10 internal and 10 external IP addresses represented in each section of the input data file. There will be at most 12000 distinct IP addresses in any tcp-dump.
You may assume that the total number of bytes sent to or from a single IP address in any single section of the input file is no more than 2147483647.
You may assume that no tcpdump-record begins with either
The output file will contain a separate summary report for each tcpdump log in the input file. The first line of the summary report will be exactly the same as the tcpdump-description line read from the input file. That will be followed by a single blank line, then the following header line, beginning in column 1:
Top 10 External Sites Visited:
That header line will be followed by ten lines, each showing the total number of bytes seen passing to or from an external IP address, and the IP address. The total number of bytes should be printed in columns 1 through 10, right aligned in the field, followed by one space, followed by the IP address, left aligned. The list should be printed in descending order by number of bytes. If two sites have the same number of bytes, they should be ordered lexicographically by IP address. There will then be a single blank line, followed by another header line, beginning in column 1:
Top 10 Internal Users:
That line will also be followed by ten lines in the same format as above, except using internal IP address. A blank line followed by a summary for the next tcpdump log will come next, until all logs have been reported on. Here is a short example of a properly formatted output file:
TCP log from 2000-03-16:15:06:33 to 2000-03-16:15:06:34. Wed Mar 15 15:06:33 2000; TCP; eth0; 1296 bytes; from 22.214.171.124:1916 to 126.96.36.199:126 Wed Mar 15 15:06:33 2000; TCP; eth0; 625 bytes; from 188.8.131.52:289 to 184.108.40.206:13 Wed Mar 15 15:06:33 2000; TCP; eth0; 2401 bytes; from 192.168.5.41:529 to 220.127.116.11:31 Wed Mar 15 15:06:33 2000; TCP; eth0; 1296 bytes; from 18.104.22.168:1280 to 22.214.171.124:168;first packet Wed Mar 15 15:06:33 2000; TCP; eth0; 625 bytes; from 192.168.5.72:1247 to 126.96.36.199:214 Wed Mar 15 15:06:33 2000; TCP; eth0; 2401 bytes; from 192.168.5.44:290 to 188.8.131.52:26 Wed Mar 15 15:06:33 2000; TCP; eth0; 2401 bytes; from 192.168.5.119:253 to 184.108.40.206:206;lost data Wed Mar 15 15:06:33 2000; TCP; eth0; 1296 bytes; from 192.168.5.95:1646 to 220.127.116.11:12 Wed Mar 15 15:06:33 2000; TCP; eth0; 625 bytes; from 18.104.22.168:1566 to 22.214.171.124:145 Wed Mar 15 15:06:33 2000; TCP; eth0; 1296 bytes; from 126.96.36.199:2017 to 188.8.131.52:103 Wed Mar 15 15:06:33 2000; TCP; eth0; 2401 bytes; from 184.108.40.206:645 to 220.127.116.11:127 Wed Mar 15 15:06:33 2000; TCP; eth0; 2401 bytes; from 192.168.5.5:1184 to 18.104.22.168:64 Wed Mar 15 15:06:33 2000; TCP; eth0; 81 bytes; from 192.168.5.117:963 to 22.214.171.124:112 END
TCP log from 2000-03-16:15:06:33 to 2000-03-16:15:06:34. Top 10 External Sites Visited: 3026 126.96.36.199 2592 188.8.131.52 2401 184.108.40.206 2401 220.127.116.11 2401 18.104.22.168 2401 22.214.171.124 1296 126.96.36.199 1296 188.8.131.52 625 184.108.40.206 625 220.127.116.11 Top 10 Internal Users: 2401 192.168.5.119 2401 192.168.5.41 2401 192.168.5.44 2401 192.168.5.5 2401 18.104.22.168 1296 192.168.5.95 1296 22.214.171.124 1296 126.96.36.199 1296 188.8.131.52 625 192.168.5.72