2
tl;dr My home network recently has been experiencing jumps from 27ms latency to 600ms. It doesn't happen always, and seems to occur frequently at night. What equipment should I buy and tests should I run to deduce the cause?
Setup
My home has 12Mb/800kb DSL. I live in the mountains, far away from other Wi-Fi sources. Historically (for years) I could ping google.com and get ~27ms times. If something was flooding the network or connection (an iPhone syncing all photos with iCloud) pings would jump into the 2000-6000ms range. But normally everything was good.
Recently, however, the network stays pegged around 600ms for tens of minutes at a time. I cannot find any device that is flooding the network. (It may exist, but I haven't found it.) The connection is generally completely fine in the morning, and generally persistently bad at night (just when we want to stream shows in bed!)
During high latency times pings to other devices on the network (some that I've tried) are unchanged (always <2ms).
Failed and Confusing Troubleshooting
I have purchased all new hardware (DSL modem, Wi-Fi routers, network switches) to rule that out. The problem persists. Here is the setup:
I have tried using the DSL Modem as the router (PPPoE + DHCP + NAT) with the Wi-Fi base stations in bridge mode. I have tried putting the DSL Modem in transparent Bridging mode and having the first Airport Extreme handle PPPoE, DHCP, and NAT. The problem persists.
I have disconnected all wired connections (leaving only the DSL modem and the Wi-Fi base station). The problem persists.
I have used only the DSL Modem (with PPPoE) and used its own Wi-Fi. The problem persists. I have attempted to hunt down every old tablet, phone, laptop on the Wi-Fi and turn them off. The problem persists. I have renamed the Wi-Fi SSID and put a password on it, connecting a single MacBook Pro laptop over Wi-Fi. The problem persists. I have used a different laptop over Wi-Fi. The problem persists.
I have connected a laptop directly to the modem over Ethernet, with Wi-Fi disabled on the modem and nothing else connected. The problem goes away! (I think...it *could* be that the problem just was not exhibiting itself on the three occasions that I tested this.)
At one point, with just a laptop connected over Ethernet, I turned Wi-Fi on for the modem and the problem exhibited itself. Ping latency immediately jumped as soon as I turned on Wi-Fi, though I do not believe that any devices were connected over Wi-Fi.
I have used iStumbler and there does not appear to be any correlation between the bad latency and increases in noise. Indeed, the SNR looks good consistently over Wi-Fi.
Remember that when things are bad they are not ALWAYS bad. Even with every device in the house turned on and connected, there are times when the latency will drop to 30ms or so for a few seconds (or minutes, or hours) before getting bad again.
Next Steps?
I think that iStumbler has shown me that the problem is not related to RF problems. (Maybe I'm wrong?) So I'm thinking it must be real traffic on the network.
The Airport Extreme base station does not support any sort of SNMP logging. Neither does the Actiontec C1000A. I don't have a switch with a monitor port, or a hub. I've never used Wireshark before.
BUT I AM WILLING TO THROW MONEY AND TIME AT THIS PROBLEM TO SOLVE IT
What should I buy? Where should I inject it into my network? What should I look for? How can I watch every packet on the network and build histograms and graphs to determine if one bad device is ruining the situation for everyone?
Edit 1: DSL Statistics when everything is fine
+-----------------+-------------+
| Connection | Status |
+-----------------+-------------+
| DSL Downstream: | 15.869 Mbps |
| DSL Upstream: | 0.896 Mbps |
+-----------------+-------------+
DSL Link Statistics
+------------------------------+---------------------+
| Link Statistic | Status |
+------------------------------+---------------------+
| Broadband Mode Setting: | Auto Select |
| Broadband Mode Detected: | VDSL2 - 8A |
| DSL Link Uptime: | 0 Days, 10H:39M:57S |
| Retrains: | 1 |
| Retrains in Last 24 Hours: | 1 |
| Loss of Power Link Failures: | 0 |
| Loss of Signal Link Failure: | 0 |
| Loss of Margin Link Failure: | 0 |
| Link Train Errors: | 0 |
| Unavailable Seconds: | 23 |
| Estimated Loop Length: | 2250 |
| Uncanceled Echo: | N/A |
| Transport Mode: | PTM |
| Path Parameter: | 201 |
| Priority: | 0 |
| Service Type: | PTM-Tagged |
+------------------------------+---------------------+
DSL Power
+--------------+-------------------------+------------------------+
| Levels | Downstream | Upstream |
+--------------+-------------------------+------------------------+
| SNR: | 16 dB | 10 dB |
| Attenuation: | (DS1)21.7, (DS2)58.8 dB | (US1)4.3, (US2)47.8 dB |
| Power: | 16.4 dBm | 7.8 dBm |
+--------------+-------------------------+------------------------+
DSL Transport
+----------------------+------------------+---------------+
| Transport | Downstream | Upstream |
+----------------------+------------------+---------------+
| Packets: | 1482864 | 1088249 |
| Error Packets: | 0 | 0 |
| 24 Hour Usage: | 1225940.68 Mbits | 2420.93 Mbits |
| Total Usage: | 1225940.68 Mbits | 2420.93 Mbits |
| 30 Minute Discarded: | 0 | 3930 |
+----------------------+------------------+---------------+
DSL Channel
+----------------+-------------+-------------+
| Channel | Near End | Far End |
+----------------+-------------+-------------+
| Channel Type: | Interleaved | Interleaved |
| CRC Errors: | 0 | 0 |
| 30 Minute CRC: | 0 | 0 |
| RS FEC: | 5873 | 29 |
| 30 Minute FEC: | 372 | 0 |
+----------------+-------------+-------------+
Edit 2: DSLReports Bufferbloat report
Running the speedtest during otherwise-normal latency indicates that the problem occurs during uploading
Ping times at night and overnight
The spike around 10:35pm was one computer starting to upload to Dropbox.
Edit 3: ISP tech support said:
Modem is getting more signals that it is suppose to. If the cables are not enough to carry the load we are sending we can lower it down to 100%. To test this is for me to lower down the signal for 7 days and you can observe if the browsing \ internet is better. After 7 days our server would run test and would boost your signals up again. And by that time we would have enough figures what to do next.
Our server is provisioning you more than your purchase. Technically this should make the internet faster but if pings and delay that are caused by traffic are observed by the customer. We can bring it to the purchased speed\ signal and observe if the DSL line on the customers premise are cable to carry the load.
Actual/Provisioned/Purchased speeds
Down: 15868/15872/12128Mbps
Up: 896/896/896kbps
Have you tried using a faster DNS server? Even with your iPhone syncing wirelessly those ping times are not actually explained. – Ramhound – 2015-10-19T17:22:42.853
Has the modem been replaced? What modem is it? What are your ADSL stats? – Linef4ult – 2015-10-19T17:23:05.270
@Linef4ult Yes, I replaced the modem. It was an Actiontec Q1000, and I replaced it with an Actiontec C1000A. I'm not at home at the moment, but when I get there: could you please clarify what sort ADSL stats are you looking for? – Phrogz – 2015-10-19T17:24:45.907
Its the line statistics. A degraded link to the DSLAM(modem on your ISPs side) could cause bursts of errors and thus intermittent issues like this. Pastebin the contents of the page that looks like this: http://screenshots.portforward.com/routers/Actiontec/C1000A_CenturyLink/DSL_Status.jpg
– Linef4ult – 2015-10-19T17:27:46.447@Linef4ult Thank you! Will do in ~7 hours. I hope this is the case (that it's the provider's/line's fault). The fact that I thought I've seen a situation where Ethernet-only fixed the problem and adding Wi-Fi caused it to go bad fills me with FUD that the problem is on my side. We'll see! – Phrogz – 2015-10-19T17:29:27.923
@Phrogz That bit doesnt make sense, this covers all the symptoms so lets see. Tag me in a comment whenever you post them and I'll check back. – Linef4ult – 2015-10-19T17:33:06.463
I don't see any basic network diagnostics in here such as determining where the latency is coming from... Throwing money at the problem should done only after you know where the problem actually is. Start with posting results of pings and traceroutes between various devices until you pin down the problem. – qasdfdsaq – 2015-10-19T20:36:01.627
@qasdfdsaq I will embark upon rigorous testing tonight. I'm experiencing the latency on every device (that I can ping with) to the first hop on the other side of my DSL modem. I have not yet proven it (will tonight) but I believe that in-LAN latency is fine. – Phrogz – 2015-10-19T21:10:30.990
If the latency is fine in your network and high on the first hop on your ISP then the problem is with your ISP (especially if it's worse at peak time, classic ISP congestion symptoms). The only thing you can do about that is phone them and complain, or switch ISP. Nothing you can change in your home will make any difference. – qasdfdsaq – 2015-10-19T22:04:50.417
@qasdfdsaq No? With switches involved, pinging from laptop B on Wi-Fi to computer C on Ethernet would be unaffected by massive problems that might be caused by device D going through router A onto the Internet. Right? Everyone (including laptop B) would experience problems as soon as they touch the main DSL modem/router, as the pipes get clogged, but that doesn't necessarily mean that it's only the fault of the ISP or the lines. I hope that it's their fault, but I don't believe that a good ping between two random devices on the LAN necessarily means the problem is not elsewhere in my house. – Phrogz – 2015-10-19T22:45:02.537
@Linef4ult I've edited the question with DSL statistics. – Phrogz – 2015-10-20T01:34:40.367
Your words, not mine. I said IF the latency is fine on your network - the only way to know that is to test every link including pinging the router and modem. If you haven't done that, then you wouldn't know whether in-LAN latency is fine or not. – qasdfdsaq – 2015-10-20T10:26:19.900