How to monitor changes in the frequency of network latency spikes over time?

0

I'm currently trying to troubleshoot an issue with my network in which I get latency spikes up to 200 seconds (normally around 50 secs) in an apparently random way at apparently random moments of the day, spaced by a few hours each time.

While trying to find what part of my messy network needs to be blamed (outside of the scope of this question - discussed a bit on chat here and here), I realized I have no reliable way to confirm that a change actually improved anything.

So far, the main way in which I notice this is that irssi shows [Lag: 15 (??)] in the statusbar, increasing every 5 seconds, and all other connections seem to be affected too. Since this depends on my observations, it's not a very reliable method to know how often it really happens.

Note that just sending ICMP pings is probably not enough, but that's just my guess. It might be a "bufferbloat" issue, it might be packet loss, it might be some buggy kernel driver, it might only apply to persistent connections. I suspect this because a few months ago, when the issue started, I had a "ping" command running in background and it didn't show anything weird at all during the latency spikes. This seems to have changed now (pings don't go through), but still, I'd prefer something more robust.

dequis

Posted 2014-06-09T02:21:04.187

Reputation: 250

1I'd be using Cacti to monitor ping latency, but that's me, and the first thing that came to me. There are probably better solutions out there – Lawrence – 2014-06-09T02:26:21.233

Answers

1

I've answered a similar question, recently, though that thread has been closed. I'll re-post it here...

Set up a full-time traffic graphing application to help identify performance problems.

Setting up a traffic graphing application to monitor your router's interface usage is probably a reasonable place to start in investigating why an office network is under-performing. Such generally require that you have an always-on system where you can set a polling program up, and leave running 7x24. While not necessary, if that system can run a web-server, then data can be examined from virtually anywhere.

A traffic graphing application will allow you to determine if demand is exceeding supply (users are requesting more bits/sec than your ISP supplies), if a particular interface is sending/receiving more traffic than usual, or if latency and other performance issues is not related to traffic volume at all. It can help identify if and when more capacity should be purchased, or if a high network load is an occasional event. It might even identify a regularly scheduled process that is causing a spike in traffic at the same time each day, or even hour.

While there are many such graphing applications around, one I've used on multiple platforms is MRTG - Multi-Router Traffic Grapher. It can monitor traffic on any SNMP capable device: I've used it to monitor large backbone cisco routers, 48+ port enterprise switches, Linksys routers, even cable and DSL modems. I recommend any network administrator have this, or a similar application installed, as to keep abreast of network usage patterns.

While there's a slight learning curve involved in installation, once it's configured, its data does not require a high degree of technical expertise to interpret, and it imparts no additional administrative load once configured, as its logfiles do not grow in size.

Nevin Williams

Posted 2014-06-09T02:21:04.187

Reputation: 3 725

Whoa, thanks for the answer. Might be too enterprise for me - I don't think I own anything that is SNMP capable, mine is just a boring home network

– dequis – 2014-06-09T03:06:26.960

For the kind of information you're looking for, traffic stats collection will be key; For periods of increased latency, you'd be looking for saturated connections, as packets aren't being lost, but instead queued up before transmission. The various ping utilities can tell you if, and perhaps when you're seeing latency, but they'll never answer why. Most vendors implement SNMP; SOHO routers, cable and dsl modems mostly all have it. There are other utilities, some that use uPNP, but traffic graphing is pretty much required to debug your issue. – Nevin Williams – 2014-06-09T19:07:00.293

1

smokeping is a way of measuring latency. Between mrtg and smokeping you might be able to get a grip on this issue.

dave taht

Posted 2014-06-09T02:21:04.187

Reputation: 11