2

My server causes too much traffic, so I have installed ntop to monitor it.

On the Summary -> Traffic page in the Global TCP/UDP Protocol Distribution table I can see the traffic is periodically caused by HTTP.

On the All Protocols -> Traffic page in the first row there is the traffic (94,4%). But the first column (Host) shows my own server. Why is this?

When clicking there, I can see that the traffic in the Host Traffic Stats table. It is all in the Tot. Traffic Rcvd column. Therefore I think, one of my applications ist periodically downloading something big, or a lot.

But how to find out, what was downloaded? What are the downloaded URLs or at least the hosts that caused the most traffic?

Witek
  • 1,433
  • 3
  • 14
  • 16

3 Answers3

1

Ntop is a network interface tool - it shows you the traffic going over various ports and protocols, but that's where it ends. What you need to look at now is to target the application that's processing that traffic, in this case Apache.

The easiest way to do this is to install a web usage tool, like webalizer (there are many others, awstats was the 'best' a while back, not sure what's king now). This will run through your logs and generate pages of statistics that you can use to see where the traffic was going, where it was coming from and who was doing it. For Example.

gbjbaanb
  • 3,852
  • 1
  • 22
  • 27
  • Why do you think Apache is processing the traffic? I have no websites which allow uploads. But I have a lot of Java/Python/... applications running, which all do HTTP requests. I just don't know which one. I could find out, if I knew which URLs are downloaded. – Witek Jul 28 '11 at 11:05
  • they say so in the question. If you have multiple apps processing http requests you should be able to narrow down which one by looking at which port or ip address is taking the traffic. – gbjbaanb Aug 16 '11 at 11:50
1

Fix the systematic Issue:
Having your application logs that make requests be unknown and all over the place is problem. This is going to bite you in the ass again and again, so I would set aside some time to address this problem. Find some way to index or aggregate them. This is larger problem project that you should raise.

The Problem at Hand:
For the problem at hand, I would recommend wireshark / tcpdump. Once you have a traffic capture, you can use all sorts of techniques to try to find it. In wireshark you could use "statistics / conversations", sort by bytes, and then drill down into the captures from there. Riverbed's non-free Cascade Pilot does have "Web Bandwidth by Object" view for captures that would be good at this -- you could request a trial.

If you are not familiar with wireshark, now is a good time learn. It is a tool most sysadmins use on a regular basis.

If you know the server taking the bandwidth, and it is a Linux server, you might try Nethogs (nethogs) to identify the process using the bandwidth.

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
0

You should examine your webservers access log, where all serviced requests are listed. You could filter for your webservers IP address and localhost and check most requested files. There are several tools for this but it depends on whatever webserver software you are using.

sw0x2A
  • 116
  • 4
  • The traffic is shown in "Tot. Traffic Rcvd". Doesn't that mean that the traffic goes from some URL to my server? Therefore nobody downloaded anything from my server. An application on my server downloaded something from the web. This would be not visible in the webserver logs. – Witek Jul 28 '11 at 09:49
  • Sorry, I missed that. When your webserver receives a lot of data, it is either fetching something from a remote host or a client is uploading or posting a lot of stuff to your webserver. However, the client-initiated connections should be visible in your webservers access log. Another way is to use a combo of tcpdump/wireshark to analyse the received traffic on webserver. – sw0x2A Jul 28 '11 at 09:59