Troubleshooting a "slow" network

Question

We've all had a complaint that the "network" is "slow" at some point: might be localized to one room (switch) or one computer, might just be Internet (DNS? Browser issue?), might be just one application (long-running SQL queries? AV scan running?).

When you've ruled out obvious system and/or application issues, how do you go about testing a network for slowness or erratic behavior? Do you work your way up the OSI layers? If so, how do go about checking each layer? What do you do to make sure the physical network is ok at an unknown environment? What about too many broadcasts or a broadcast storm? Layer 3 and up? traceroute? Any other tips, methods, ideas? Must-have features and tools (port mirroring, SNMP, monitoring, etc.) for all sizes of networks?

Duplicate? http://serverfault.com/questions/88/slow-network-speeds-what-should-i-check — Joril, Jun 23 '10 at 13:37
possibly, but I figured a wiki would have a bit more longevity and give more people a chance to contribute. — WuckaChucka, Jun 23 '10 at 14:32
First off I have to be convinced that its the "internet"! More often then not its not the "internet". Most lusers I've been around say the internets down even when they are trying to access a local file server.. — tony roth, Jun 23 '10 at 14:40
Its because all of your users are streaming video feeds of the World Cup right now! — BillN, Jun 23 '10 at 16:25

score 10 · Accepted Answer · answered Jun 23 '10 at 14:27

10

tcpdump and wireshark are your friends.

I find that watching packets on the wire of a 'slow' network vs a 'good' network is usually what pinpoints a problem.

There are many types of 'slow'.

You can track latency to local and internet sites using a tool like SmokePing. (SmokePing can be configured to track ICMP latency as well as service latency from TCP services)

Your switches should track broadcast packets vs unicast packets. Graph that ratio.

I also like to monitor traceroutes (checking domain names of ISP hops between myself 'important' sites).

I hope these comments help.

answered Jun 23 '10 at 14:27

Joel K

5,765
2
29
34

1

When watching packets, what are some things you're looking for or "telltale signs" that there's an issue? – WuckaChucka Jun 23 '10 at 14:34
3

Look for a large number of TCP retransmissions and \or TCP resets. also look for a high percentage of broadcast traffic. – joeqwerty Jun 23 '10 at 15:31
excellent. I would almost put that into a separate answer. – WuckaChucka Jun 23 '10 at 15:58
if you can use netmon 3+ from MS go to microsoft research and download the tcp analyser http://research.microsoft.com/en-us/downloads/2ff17024-7eed-43db-93eb-ab69074b0b93/default.aspx its pretty cool for debugging network issues. also there is a 32bit version if necessary. – tony roth Jun 23 '10 at 16:04
+1 for SmokePing. That, along with things like IPSLA in Cisco routers and switches, can go a long way towards helping you understand if there is a slow network, or a slow application. – Christopher Cashell Jun 24 '10 at 14:26

score 6 · Answer 2 · answered Jun 23 '10 at 15:42

It is hard to give specific answers since 90% of this job is experience which teaches you where to look for which kind of problem, and the other 90% is knowing where to look on Google to get hints of where to start.

I usually try the paper-bag stuff like getting the customer to demonstrate the problem (mostly to rule out finger-problems and any issues the customer may have describing his problem), then trying to duplicate the problem on another computer. Doing that often gives you insight into where to look.

Don't forget the corrective problem of a reboot, especially for Windows systems, even today. It used to be like this so much that I would ask people "Have you rebooted? Well try that and let me know if the problem persists" -- this fixed a very large percentage of the issues I was asked about.

There's frequently also low-hanging fruit in DNS resolution problems and basic connectivity (ACLs on routers, air-gaps in the network, pings/traceroutes/mtrs to remote sites, etc).

For services you have direct control over, running nagios or something to ensure the service is actually running can frequently trigger you to fix problems before customers tell you about them. You probably also want to be running stats gathering, either directly through munin or something, or via SNMP to something like Cacti.

I usually try to have Cacti running against at least all my core switches and firewalls; where possible, I run Cacti against everything I can. In these cases I am usually looking for things like port error counts or excessive traffic. Firewall graphs from some devices can show you CPU usage and concurrent sessions; you'll get to learn at what thresholds your firewall device starts to have issues.

Your firewall may be able to log to a syslog device; if so, log everything you can and look through those for hints. This will be easier if you run something like syslog-ng or rsyslog or splunk that lets you divide your logs somewhat rather than dealing with one monolithic file.

I also try to run nfsen against at least the inside of my firewall, and the uplink to the internet provider where possible. This lets you go back in time to look at sessions to see who was doing what; this sometimes can catch interesting behaviors.

score 5 · Answer 3 · answered Jul 08 '10 at 12:42

Here are a couple of useful tools for troubleshooting latency and other network issues:

the OSI model - start from the bottom and work your way up
ping - check your RTT (i.e. latency)
HTTP ping - usefull if your firewall blocks normal ICMP's
ping -r 9 - useful for identifying asymmetric routing situations
traceroute - how are my packets getting there and how are the routers along the way responding? Be aware that routers often process these packets at a low priority, so real performance may be better.
Wireshark - takes some expertise, but your can't get much lower-level
SpeedGuide.net TCP/IP Analyzer - check your PC's TCP settings
SG TCP Optimizer - (Windows only) suggest ways to optimize your NIC settings
IP Chicken - what is your source (non-NAT'd) ip address?
http://downforeveryoneorjustme.com/ - maybe it is you...
Bandwidth speed test - check your download / upload speeds
Network tools - run tools/tests from outside your network
check your network ports for errors/CRC's/etc. -
check your network for over utilization (bandwidth monitors) & broadcast storms
check for unicast flooding - use wireshark and monitor for unicast traffic that is not destined for your workstation.
verify your spanning-tree root bridge is placed properly

If the ping -r times out, what does it say? For example a `ping 8.8.8.8` does work, but a `ping -r 9 8.8.8.8` doesnt — Michiel van Vaardegem, Mar 08 '19 at 07:44

score 4 · Answer 4 · answered Jun 23 '10 at 23:32

If you're running a wireless network, one of the frequent slow downs is channel interference. A bunch of SSIDs in one area can really slow down network traffic. (Think: the demo of the iPhone 4 at WWDC '10).

Troubleshooting this problem is fairly easy if with software that can show you the wireless traffic patterns in the area. There's a good free and web-based one at: http://meraki.com/tools/stumbler. (disclosure: I work for Meraki)

To reduce interference, it's best to be on channels 1, 6, or 11. Using 802.11n gear with the 5GHz frequency could also help.

score 1 · Answer 5 · answered Jun 23 '10 at 14:16

1

I always start with monitoring the layer 2 stuff using Cacti. That will give you a good amount of data which you can use to look for patterns and you can compare your Cacti graphs when everything is working well vs when the users see slowness.

It probably isn't going to find the exact problem but it will give you a good starting place to help narrow down the problem.

answered Jun 23 '10 at 14:16

TonyB

383
2
6

Anything in particular you're looking for in the Cacti graphs? – WuckaChucka Jun 23 '10 at 14:42

score 1 · Answer 6 · answered Jun 23 '10 at 14:27

1

I start at the outermost router and work my way down, and I measure performance in the most primitive way: use a bandwidth testing site, or a known external FTP site that will give you your upload/download speed, and keep going down until you find the level where the problem resides.

Once you know where the problem is, deploy your fancy tools and monitors. But don't waste time doing that stuff on every layer. It'll take forever.

answered Jun 23 '10 at 14:27

Satanicpuppy

5,917
1
16
18

What about for internal application performance though? – WuckaChucka Jun 23 '10 at 14:33
@wuckachucka: Usually if there is an issue with the code, it shows up all over the logs, so troubleshooting isn't that bad. You also know where to start (the application). The biggest problem with network troubleshooting is FINDING the problem. If you have port speed mismatches or bad MTUs or other physical issues, those are a complete bastard to troubleshoot via logs, and the caveman approach has a lot of advantages there. – Satanicpuppy Jun 23 '10 at 14:50

score 1 · Answer 7 · answered Jun 23 '10 at 15:54

You also need to know your servers and desktop/client environment, rather than simply assuming the user is correct when they say "the network is slow." You need to methodically troubleshoot each issue - as others have said, you should first be able to view and ideally reproduce the error, and then work from there in a way that makes sense for the scenario.

Having good management and monitoring on the network and servers can save you a lot of time, however, because you're not trying to come up with instrumentation on the fly while possibly also trying to mitigate or fix the symptoms, and deal with complaining users/customers.

The answers for tcpdump and wireshark aren't wrong, those can be vital pieces of your toolkit. But unless you're dead certain that it's actually the network, they shouldn't be the first thing you reach for.

score 0 · Answer 8 · answered Mar 14 '15 at 01:56

Slow network is a common phenomenon. Slow network speed can be caused by a number of things. to troubleshoot slow network is one of the most common and troublesome work in daily network management.

According to analysis, major reasons for slow network are:

Loopback
Broadcast/Multicast storm
Virus attack
Server slow response
Too many clients
Application slow response
Error client mask

How can we quickly find out the cause for slow network happens? It's a good idea to capture and analyze packets with a network analyzer (Ax3soft Unicorn, wireshark and so on).

You also read the article "Find Reasons for Slow Network", clicking to the URL(http://www.ids-sax2.com//Unicorn/Tutorials/Find-Reasons-for-Slow-Network-with-Ax3soft-Unicorn.htm) to visit it.

Troubleshooting a "slow" network

8 Answers8

Linked