1

I manage a small LAN (30 computers, mix of Linux, Windows, and Macs). Transferring a 100MB file to a local server (i.e. in the office, not on the Internet) used to take me about a couple minutes, but recently it's been taking nearly 30 minutes. I've checked my localhost and the server, and each machine is fine, so I'm assuming there's some issue with the network.

How would I diagnose what's slowing down the network, and/or finding computers on the network using an unusually high amount of bandwidth?

What are good network monitoring tools for Linux (specifically Ubuntu) that will help me in this task? Most I've found seem geared for monitoring the network access of the localhost, not the access of other machines on the same network.

Cerin
  • 3,497
  • 17
  • 57
  • 72
  • Related: [Troubleshooting a Slow Network](http://serverfault.com/questions/154004/troubleshooting-a-slow-network) –  Nov 06 '11 at 21:13

5 Answers5

3

Most I've found seem geared for monitoring the network access of the localhost, not the access of other machines on the same network.

This is because they would be mostly useless in a switched network. The switch is separating the data traffic so a host ideally only gets the data it is intended to get. If you need network-wide statistics, you would need to monitor the RMON statistic counters of your switches (only available if you are using managed switches).

It is also quite likely that you are not seeing a bottleneck, but errors in transmission due to either mis-configuration (e.g. a duplex mismatch) or bad cabling. Observing the error statistics counters of your switch should give some clues.

the-wabbit
  • 40,319
  • 13
  • 105
  • 169
2

Besides checking the hardware, consider investigating what happens at OSI Layer 8 -that is, the users.

There are cases where employees use file sharing applications or video/audio streaming on their workstations -something that may cause a serious impact on a network's performance.

Have you considered testing your network after the "normal" work hours?

In order to solve various network problems, I have configured a central GNU/Linux router, where I use tools like iptraf in order to monitor the current network usage and obtain detailed information on the traffic that is originated from and destined to each host on the network.

tcpdump or Wireshark are excellent tools for debugging mysterious network problems and slowdowns.

dkaragasidis
  • 745
  • 4
  • 11
1
  • Use Wireshark or Tcpdump to actually see what is happening during your slow file transfers. The problem might not be related to network usage at all (it could very likely be Layer-7).
  • Isolate and then reproduce the problem:
    • It just one client or all the clients?
    • Is it just one server or all the servers?
    • Does it happen all the time or just occasionally?
    • Can you use specific steps to reproduce the issue or does it just "randomly" happen?
  • You need a managed switching infrastructure to gather useful information. Either RMON, SFlow or even just SNMP-delivered port counters or statistics will be extremely helpful.

In my experience the network is frequently blamed for "slowness" when the root cause of the problem is somewhere else entirely. A few examples:

  • A fat client application running on a workstation with minimal memory
  • A vendor mis-configured routing on their appliances
  • A vendor configured their devices to have static addresses inside our DHCP range, workstations that were later assigned those addresses had "problems"
  • Youtube is slow, therefore the network is slow. (Yes, Youtube is slow... because we throttle it).
  • A workstation was misconfigured to index the user's network shares
  • An update for Internet Explorer broke backwards compatibility with an ancient (circa '06) web server used for management on a few COTS embedded devices. IE was no longer making the "correct" GET request, resulting in every session getting reset. A firmware upgrade improved things.

Some other advice (generally it is one of these three):

  1. Check the physical layer and the data link layer first. Nine times out of ten, it's a patch cable that someone ran over repeatably with their office chair (by the way, the port errors will show up in SNMP-reported switch statistics... useful see?). Or the moving company parked their truck in front of one our wireless bridges. Look for bad terminations (use a cable tester), out of spec cable runs, and duplex mismatches, or broadcast loops especially if you don't have physical control over all the switching infrastructure.
  2. Layer-7: Look for client or server misconfigurations or configurations that are no longer relevant (Wireshark is your friend here). DNS problems. Are network backups running during the day? Or WSUS updates being applied? Etc.
  3. Layer-8: And finally there always seems to be someone watching videos via NetFlix (RMON or SFlow will discover this).
0

How would I diagnose what's slowing down the network, and/or finding computers on the network using an unusually high amount of bandwidth?

Start with checking hardware, i.e. - real bandwidth of LAN, NIC-settings, quality of cables and contacts, LA of server on file-transfer, HDD-speed and (possible) errors. Even "a couple minutes..." for 100+ Mbytes files is very slow transfer (for 100Mb LAN)

Lazy Badger
  • 3,067
  • 14
  • 13
0

I would suggest you use managed switches as syneticon-dj suggested, and have a local server be configured to monitor its vitals and traffics, you can use cacti to graph its traffic, cpu/memory usages and else. You can also configure it to send off alerts when the thresholds cross some level that you configure in, nagios would be more useful in such alerting tasks.

Gaumire
  • 825
  • 6
  • 12