What is the best tool to monitor/analyze network traffic on an entire network (several subnets)?
I'm looking for something that will help me toubleshoot bandwidth problems when, for instance, users start complaining that the "network is slow"
What is the best tool to monitor/analyze network traffic on an entire network (several subnets)?
I'm looking for something that will help me toubleshoot bandwidth problems when, for instance, users start complaining that the "network is slow"
I'm assuming you have a commercial router/switch, it most likely has SNMP which you can combine with MRTG for a nice traffic graph.
I think your best bet is going to be a mixture of Cacti and Ntop.
ntop is going to provide you information about the traffic on your network, like the hosts that are consuming the most... what traffic is causing slowdowns, etc...
Cacti is going to give long term trends about your bandwidth consumption so you can tell how you networks traffic has changed over time.
When you have users reporting 'network issues', the problem could relate to a multitude of issues (routing, switching, host configuration, unicast, multicast, security policy, hardware failure). It's very unlikely that you'll find one piece of software to monitor all your different potential problems.
Instead, focus on two things:
Instrumentation: come up with a monitoring strategy that allows you proactively monitor for those faults that occur regularly. See this previous answer for more detail.
Troubleshooting: come up with a quick, standard series of tests that you can run to immediately try and isolate where the problem might be, and publish it to your users.
Some example tests:
These kinds of simple diagnostics can often point you very quickly in the right direction. Finally, if you can, always get a source IP, a destination IP, and a destination port. Try and educate your users; ambigious complaints like 'the network is slow' can't be easily diagnosed.
I'm working at an organization that has a small to medium sized network (~500 users) and about a dozen /24 subnets (and a handful of smaller ones behind NAT). We use a variety monitoring software that allows us to keep tabs on remote parts of the network and respond to problems proactively.
I have been using smoothwall at home with great success, it does a great job monitoring traffic and a ton more.
It comes in a corporate edition as well that does some more fancy stuff.
I was trying to figure out why I kept on running out of bandwidth (in Australia we have limits) turns out it was my fault :)
Check out the products from VSS Monitoring. They have several different in-line fail safe products for monitoring network traffic remotely. Once you have them peered into your network(s) and on the backbone, it is as good as being there.
If you have a router capable of reporting netflows, look into a netflow handler. Where MRTG will provide link utilization, netflows report IP and protocol usage flowing through the router. So, instead of "Suzy in accounting it using a lot of traffic" or "The port the WAP is on has high utilization", you could see "Suzy in accounting is 10% LAN traffic, 40% streaming media, and 50% internet HTTP traffic.
Unfortunately I don't have a recommendation for a free flow aggregator. After a net monitoring company tried to sell my company a solution and I determined that their whole product was based on netflows, I made a note to research them. Before I got around to it we bought another NOC solution that also included a flow aggregator.
First of all, are they users complaining about your local network ?
The fileserver is slow!
or are they complaining about remote websites ?
Facebook is slow! I can't do my work!
If it's the former, then I would start with the fileserver in question and work backwards. First of all check the fileserver, is it's utilization out of the ordinary ? Check the interface that user traffic flows over. Is it pegged ? Is auto negotiation enabled ? Is it enabled on both ends ...
If everything looks ok there and the server is not under any undue load, try the routers and switches in the path between the user and the server. Are they overloaded ? auto neg enabled ? check the interface counters for errors.
If that appears to be nothing wrong, then the problem may be local to the users work station. Is it under undue load ? Are there any hardware errors (disk errors causing blocking while the firmware retries) ? Is their machine low on real memory (firefox paging hard) ?
This usual solves 99% of the problems.
Depending on the frequency you have to deal with these requests you may prefer to reverse the order of these steps.
Alternatively if it's a problem with a remote site, after debugging your network, and the users workstation try tools like mtr to detect packet loss between you and the remote site. If the problem is not local to your network then your options are probably limited to logging a case with your provider, or waiting till the remote site gets over whatever tizzy it's having.