Best way to troubleshoot intermittent network outages?

5

5

We have a Comcast 50/10 line into our office. We keep seeing very short but sometimes frequent drops in our internet service. It's enough to kick you off of skype and stop any websites from loading, which is obviously affecting our productivity.

  • We've tried 4 different routers, we've tried moving everyone off of wireless and onto wired via a switch and so far nothing has helped. Right now we're on a Cisco SB WRP400-G1 router. Attached to the router is a 16 port switch going to the ports in all of the offices.
  • We've moved to OpenDNS in the case that it was the comcast DNS servers going down.
  • Today we tried putting the modem, router, and switch on a UPS to make sure it wasn't power fluctuations that was causing it.

Every time we call Comcast, by the time they are here the internet is working fine.

I'd like to somehow prove that the problem is with Comcast, so if that means plugging in a machine directly into their router and collecting data all day, I'm up for that. I just want to hear ideas on what tools to run and how to collect this data.

I could just continuously ping google.com all day long but I'm not sure how valuable that data would be.

Thoughts?

Ben Scheirman

Posted 2011-02-24T18:56:21.027

Reputation: 265

2run a tracert to 4.2.2.2 or some well known public dns server and show them it is hitting your router then not making it out. As long is it is making it to the router during these outages chances are that it is not a problem with your internal LAN. – Supercereal – 2011-02-24T18:59:43.147

The problem is that the problem is more than likely not inside your building. :/ @Kyle's suggestion is about the only sort of thing you can do. – Shinrai – 2011-02-24T19:22:29.923

Traceroutes or pings out to or across any network that the ISP does not control will just be ignored by the ISP. They will only accept documentation that proves anything only on their network. Always choos DNS or another server on their network when running pings or traceroutes for tests. – MaQleod – 2011-02-25T00:04:43.670

Ben, I have the same issue with ComCast at home. I have the triple play and have TV and Phone also included. I also get frequent drops and the phone doesn't work in addition to internet. The TV is unaffected during these times. I was able to log into the ComCast provided cable modem and see it's internal log saying that it cannot reach their services. I'm not sure what your setup is but may be something to check. – edeevans – 2011-02-24T21:20:23.640

They changed the modem password, so none of the known defaults work (SMC). How did you get into yours? I suspect this business class modem is different than your home one though. – Ben Scheirman – 2011-02-24T21:59:44.747

Answers

10

Your first test should always be with a single computer directly connected to the modem and then ping your primary DNS for about an hour. Look for things like jitter, consistently high latency and packet loss. Comcast will ignore any results to IP destinations that are not on their network or go over networks that they do not control, so only ping your DNS.

Your second test should be something that monitors more than just ICMP, try SNMP monitoring. This will record data to prove that when it goes out that it is definitely not a utilization issue.

Third test is with a program called MTR which will give you real-time traceroute results, it will be able to run for a long time and when you see the problem you can see what is going on on the MTR. This will provide some insights to any possible routing problems (target your primary DNS).

Last, try to bypass any inside wiring you can, if you can get to where the cable connection comes in off the street, try plugging in there, this will rule out anything within the building and isolate it to only what comcast controls.

MaQleod

Posted 2011-02-24T18:56:21.027

Reputation: 12 560

Will take some time to try this out, but want to give you a +1 for the detailed answer. Thanks! – Ben Scheirman – 2011-03-01T19:02:34.513

2Revisiting this old question, MTR was a huge help in gathering data. – Ben Scheirman – 2012-11-30T16:42:12.480

1

First, I would recommend using some online monitoring tool. Not to advertise any products, but Pingdom offers free account for monitoring only one destination, so setup is straightforward. That may or may not provide some data, but anyway you are not using money or much time for it.

Second, if you have Linux server (or another computer running Linux), install smokeping and configure it to ping some well-known targets (www.google.com, ping.funet.fi, maybe Comcast website etc.). That may or may not give you something useful.

Third, running tracert/traceroute if there is long enough outage may be useful. If it shows traffic going out of your office and then dropping in some Comcast router, it's easier to prove that it's not caused by you.

Fourth, try changing switch too. It's possible (even though not likely) that's culprit.

Fifth, ask recommendations from Comcast (what's correct router/modem for their connection and what settings you should use). If there is nothing in your network (try to make it as simple as possible for testing it, maybe just computer connected directly to router with crossover cable), it's hard to blame you.

Diagnostics for short network outages (just dropping connections for example, without longer breaks) is very hard problem, as there is so many potential points of failure.

Olli

Posted 2011-02-24T18:56:21.027

Reputation: 6 704