Network connection troubleshooting, unreliable ping

1

I am trying to troubleshoot a network connection. The connection is wireless, and for the most part works well. (Physical connection from computer to router, physical connection from router to wireless ISP rooftop antenna/dish.) However, sometimes the bandwidth will seem much slower than normal, and various operations (such as uploading a file to gmail) will fail.

I decided to ping a reliable server, ping google.com -t using a Windows machine, and I see basically fast responses for the pings that reply, but occasional gaps as though the connection was completely absent.

enter image description here

What does this mean and how can I further diagnose the problem?

JYelton

Posted 2012-07-16T22:29:23.983

Reputation: 2 848

does it act the same with other high-end Internet sites (like say www.intel.com)? What happens if you ping your router like that? Does it behave similarly? – Ƭᴇcʜιᴇ007 – 2012-07-16T23:31:09.520

@techie007 Yes, the packet loss is the same for all sites. Some don't respond to ping, but of those that do, the percentage loss is similar. – JYelton – 2012-07-17T19:45:42.690

So does pinging your router show the same kind of loss? – Ƭᴇcʜιᴇ007 – 2012-07-17T23:03:58.287

Answers

3

For TCP, 0.1% packet loss is on the margin of being bad. 1% packet loss is a lot. 10% is unbearable.

You're close to 12% in that example. You certainly need to resolve your packet loss problem first, and then worry about any remaining throughput problems.

Open up two windows, one pinging the private-side IP address of your Wi-Fi home gateway AP, and the other pinging an IP address on the far end of your rooftop WISP link (that is, some IP address at your ISP).

If both drop at the same time, you're having Wi-Fi problems. If just the WISP one drops, then your WISP connection is having problems.

Check what frequency range your WISP is using, and make sure your Wi-Fi home gateway AP is not using the same frequency range. For example, one guy I was helping had a rooftop WISP that used 5.7~5.8 GHz equipment that overlaps with the high end of the 802.11a/n 5GHz band (Wi-Fi channels 149-165), and this guy's simultaneous dual-band Wi-Fi AP's 5GHz radio was was set to channel 149. When he changed it to channel 36, his problems went away.

If the problem is with your rooftop WISP link and you can confirm that you're not interfering with it with your own Wi-Fi network, then you'll have to speak to your WISP to get them to fix their link. If they can't provide you with less than 1-in-1000 packet loss, explore your other broadband internet alternatives.

Spiff

Posted 2012-07-16T22:29:23.983

Reputation: 84 656

This depends on who is defining those margins. A typical ISP, on a business class circuit, will accept 3% loss as normal. If you're a home user, often times you have to prove a constant 5% or higher to get anything from the ISP. – MaQleod – 2012-07-17T05:55:24.907

@MaQleod Can you cite a (preferably primary) source on that? The business SLAs I'm finding online, such as AT&T's business class DSL and U-Verse, seem to quote "packet loss shall not exceed 0.1%". Verizon's says 1%. – Spiff – 2012-07-17T06:38:07.407

I worked for a nationwide ISP, we used AT&T and Verizon for our last mile, though they didn't run layer 2 or 3 at any step, that was all handled by Covad, NewEdge or Qwest (they were only layer 2, we did layer 3 on those circuits), and also XO (who actually handled layer 3 as well). There were times we reported 20%+ loss to XO and they thought it was fine. It took a lot to get AT&T or Verizon to fix simple layer 1 issues. Getting anyone else to fix anything was a lot of work (except Covad, they were pretty good about their ATM cloud). We might just have had bad partner contracts... – MaQleod – 2012-07-17T14:15:32.143

I should also add that the loss needs to show up on a hop that the ISP has direct control over or it doesn't fall into their guarantee. They will only guarantee their own network. So anything after a packet leaves their network on its way to say, google.com, can go into a black hole for all they care. – MaQleod – 2012-07-17T15:40:17.920

I edited the question shortly after posting - the connection to the router is all hard-wired. I am suspicious the router was the problem, after a reboot the packet loss was dramatically less (though still not lower than 1 or 2 percent.) – JYelton – 2012-07-17T19:47:30.323

3

Most ISPs will not do anything with 3-5% loss. If you have a business circuit, you can complain about 3% or higher. If you have a resi connection, you're not going to get much research out of the ISP until you can prove a constant 5%, and only on their network.

The first step is a direct connect. Connect a computer directly to your modem and try again. If you still see loss, then connect the modem directly to your NID and try again. At this point, if you still see loss, try calling your ISP, they'll have you do exactly that anyway, so you might as well do it before you call. If you want to do further testing, you can use MTR on a *nix box or winmtr or pathping on a windows box to get the loss at different hops. This will let your ISP know if it is a network that they have any influence over. If it is on their backbone, they can do something about it. If the issue occurs off their network, then the best they can do is try and re-route you (and you'll likely have to push to tier 2 or 3 to get anyone that knows how to do that).

If you don't see loss after you direct connect to the modem, the issue is your network. Try different wireless card, different router, try wired to your router, just try to remove/replace any and all variables until you notice a difference, then you've found your culprit.

MaQleod

Posted 2012-07-16T22:29:23.983

Reputation: 12 560

0

From what I can see, most of the pings receive a reply, and with a relatively good RTT.

The timeouts you're seeing are likely due to packet loss (yes, there are packet loss, mainly in wireless links).

The TCP protocol doesn't deal very well with packet loss. Packet loss are an implicit method of determining newtork congestion. When a lost packet is detected the congestion window of the TCP protocol lowers what (in simple terms) means that the bandwith will lower too.

As you most likely are using TCP for the tasks you refered (uploading files and sending email), the packet lost you're seeing can explain the low bandwiths.

To further diagnose the problem I will do bandwith tests mostly comparing UDP and TCP, as UDP doesn't have this mechanisms for controling congestion.

I may be misunderstanding the problem, but at least if it happened with me I will start this way. Also, I'm not realy an expert in TCP to know if the packet lost ratio you're experiencing is enough to explain the low bandwiths and the failed operations.

fmanco

Posted 2012-07-16T22:29:23.983

Reputation: 2 287

0

Next step would be to find out where the packet loss occurs. Wireless links are more likely to cause packet loss as there might be other signals on the same frequency. So it would be interesting to see if you can reliably reach your local router (probably) and some host on your ISP's network (probably not).

Maybe there is some way to get status data for the wireless link like throughput and/or carrier/noise ratio. It might also help to find out if other nearby users of this ISP (who use another wireless link endpoint) have the same problem. If so, there might be a local noise source in your area. In this case you probably cannot do much about it but to inform your ISP who could try to optimize the link or to locate and shut down the noise source.

Gurken Papst

Posted 2012-07-16T22:29:23.983

Reputation: 3 874

I agree that finding out where the packet loss occurs is a proper next step. I might also suggest a tracert to google in addition to Gurken's tips. If the response times on the way to google hold up after a few tests, you can get a bit closer to ruling out network congestion outside your LAN. – JoshP – 2012-07-17T00:55:52.957