What exactly triggers a "DSL Link Retrains" error on a dsl modem?

6

1

My AT&T DSL connection has been extremely flaky of late. After a lot of hair pulling I figured out that every time the modem logs either a 'DSL Link Retrains' or a 'DSL Training Errors' the internet connection is dropped. This becomes a major source of frustration because my work RDP fails and I have to re-connect.

After much effort, I was able to initiate a communication channel with ATT but last night someone from their support called up and said since there were no 'line drop' errors in the logs, the only other reason is the modem has gone bad.

I really don't expect $100 DSL modems to expire in a year (right after warranty expired).

Google isn't helping much. If someone can throw light on what causes the above error I will feel better when I buy the next modem and set some expectations for myself. In other words I can't take ATT support's word that the modem has gone bad!

Following is the complete Error Report

Statistics
Collected for 1 day 1:38:37

                        Since Reset    Current 24-Hour  Current 15-Minute  Time Since Last Event
Interval Interval ATM Cell Header Errors: 0 0 0 0:00:00
ATM Loss of Cell Delineation: 0 0 0 0:00:00
DSL Link Retrains: 29 8 0 0:16:46
DSL Training Errors: 20 3 0 1:11:10
DSL Training Timeouts: 0 0 0 0:00:00
DSL Loss of Framing Failures: 0 0 0 0:00:00
DSL Loss of Signal Failures: 0 0 0 0:00:00
DSL Loss of Power Failures: 0 0 0 0:00:00
DSL Loss of Margin Failures: 0 0 0 0:00:00
DSL Cumulative Errored Seconds: 0 0 0 0:00:00
DSL Severely Errored Seconds: 0 0 0 0:00:00
DSL Corrected Blocks: 0 0 0 0:00:00
DSL Uncorrected Blocks: 0 0 0 0:00:00
ISP Connection Establishment: 1 1 1 0:01:31

@sawdust Thanks a ton for the info on Line Noise. On further digging I pulled up this report from the router. Anyway to tell for sure if this looks healthy or not?

Both the answers deserve an upvote.

DSL                Down              Up
Current Rate:       6016 kbs        768 kbs
Max Rate:          10448 kbs        1060 kbs 
Current Connection: Current Noise Margin: 21.0 dB 14.0 dB Current Attenuation: 20.8 dB 12.0 dB Current Output Power: 19.7 dBm 11.9 dBm
ATM Cells Errors % Transmit: 4412167 0 0 Receive: 67168824 0 0 IP Bytes Packets Errors % Transmit: 172853405 1427956 0 0 Receive: 3160277814 2535418 0 0

sumitkm

Posted 2011-09-30T22:14:13.273

Reputation: 231

sawdust and maqleod (check answers below) both get equal points for the help they offered. I eventually purchased a new modem and the issues immediately went away. So apart from the line issues noted by sawdust and maqleod it is also true that if the modem is bad you will have dsl link retrains. Marking my own answer seems a little cheesy, so leaving it open. I'll upvote both the answers as soon as I have enough reps. – sumitkm – 2012-03-11T11:47:09.337

Answers

3

Training errors are problems with the syncing process to the DSLAM at the CO. DSL Link Retrains increment each time the modem disconnects and must retrain and training errors are issues it encounters in the training process.

As far as possible reasons for these errors, that can vary. You could have too much noise on the line, you could be too far from the CO for your current speed, you could have some sort of voltage on the line or your modem could be bad. Those are really the most common, there are still many more (like bridge taps and NID splitters, or even problems with the DSLAM port that you are on or your cross-connect at the CO, but those issues are much rarer). If you are behind the times and are still on lineshare, there could still be other reasons.

The first step would be to swap out the phone cable you are using. If that doesn't work, go out to the NID (phone box) and plug into the interface there (depending on the NID style, you may need to strip an end of the phone cord). If you still see the same problem, try a different modem. If still nothing changes, it is time to call AT&T and have them run a set of loop tests on the line.

MaQleod

Posted 2011-09-30T22:14:13.273

Reputation: 12 560

First up thanks for the quick response. I am not very well conversed in Networking terminologies hence a few more questions: – sumitkm – 2011-09-30T22:37:44.093

Oops. Couldn't finish typing in 5 minutes: First up thanks for the quick response. I am not very well conversed in Networking terminologies hence a few more questions: What does CO stand for? . I don't have those filter things that come off the phone jack, because the DSL line doesn't have a phone number on it so setup instructions said I don't need them. And I think ATT didn't send them along with the modem either (I'll lookup). ATT is citing the above report as proof that there are no further needs for line testing!!! And insisting I pay for the visit! – sumitkm – 2011-09-30T22:44:15.190

AT&T can't use that as proof for anything, it doesn't say what is wrong at all, just that something is wrong. AT&T is responsible for the lines on the street, not in your home, which is why you will need to test out by the phone box, it bypasses what you are responsible for and puts the blame on them. If you still have issues there, you know for a fact it is them and it is on them to figure out what is happening between the MDF and your box. CO is short for central office, it is where AT&T keeps all their equipment. – MaQleod – 2011-09-30T22:52:36.410

Aah! Now I see what you mean by test at the phone box. I'll see if I can get to it. Even if I can't I know where the line comes in to the house. I'll see if I can hookup there and if it makes any difference. Thanks again. Both you and @sawdust deserve upvotes. Unfortunately I don't have enough reps yet. But this definitely goes into my long list of SE articles that I should be upvoting once I have enough reps :) – sumitkm – 2011-09-30T23:09:31.907

1

The ADSL modem at the phone company's central office is the line or master unit, and the ADSL modem at your home is the remote or slave unit. In order to establish a bi-directional ADSL link, each end of the ADSL link performs a prescribed sequence of transmission and receive tests. One modem will transmit, while the other is in receive mode, and then use the results to configure transmission power levels, equalization and other parameters. This "handshaking" sequence to setup the link is called training. The signal processing to transmit and then receive an attenuated ADSL signal after thousands of feet of small gauge copper wire is quite sophisticated. The training phase is required to setup the signal processing before the link goes active.

The criteria for a "bad link" or "loss of link" (that would initiate a retraining) could vary by modem and/or carrier. Too many bit errors in the last frame, or too many cumulative bit errors or loss of frame sync are possible cause.

You should check your line quality or transceiver stats at you modem (through its web page, use your PC's web browser). Low (signal to noise) margin and/or high line attenuation could indicate a possible problem (especially if the numbers have changed).

sawdust

Posted 2011-09-30T22:14:13.273

Reputation: 14 697

what exactly is a good / bad number for noise margin or line attenuation? – Jason S – 2017-09-08T17:43:35.187

@JasonS - "Good" numbers are installation specific. Line attenuation can be used to guess the distance from the CO (the wire length, not as the crow flies). – sawdust – 2017-09-09T00:24:49.847

yeah, that's too vague. I mean at least there should be a reasonable range of margin. (0dB? 20dB?) You're telling people to check for low noise margin or high line attenuation but there's no quantitative criteria to do so. – Jason S – 2017-09-09T13:15:27.263

@JasonS -- Thanks for the downvote, ingrate. The usual advice is to maintain a history of those site-specific numbers. Then when you have ADSL issues, you can suspect the line depending on whether these statistics have deteriorated or not. – sawdust – 2017-09-09T20:20:25.963

oh, i see. Add that to your answer + I'll upvote. – Jason S – 2017-09-10T14:01:22.220

Thanks sawdust +1. I updated the question with the 'noise stats'. Please let me know if I am looking at the right place and if those values are enough to judge a good/bad/ugly status for either the line or the modem. – sumitkm – 2011-09-30T23:23:41.087

Yes, those are the salient statistics, and they look okay. The last 6 lines are interesting. "IP" would be "Internet Protocol" for your Internet service. "ATM" would be for a digital phone service, and there was a lot of ATM I/O. What is causing that traffic on your ADSL line? Have you neglected to mention that you have Ooma or something similar? – sawdust – 2011-10-01T00:07:43.340

I do have Vonage (similar to Oooma) VOIP box connected to the router. And I am sorry if they are important in the context and I didn't realize. Will a round of testing without the VOIP box connected help? Also the wireless telephone handset is in close proximity of the router (not sure if it is relevant). – sumitkm – 2011-10-01T00:49:21.150

Sawdust - you may have been on to something. I disconnected the Vonage box after reading your comment. Last 5 there have been no new DSL Line Retrains or DSL Training Timeouts! Leaving it disconnected for the night to see how it goes! – sumitkm – 2011-10-01T05:55:43.783

@sawdust, ATM is not for digital phone service, DSL layer 2 uses the ATM protocol. AT&T uses an Alcatel ATM (some Juniper) switch layer that trunks their DSLAMS. Every DSL network I have ever seen uses ATMs, digital voice or not. Digital voice will use SBCs (for SIP anyway) and applications servers (if hosted voice) which are all at a higher layer than ATM. – MaQleod – 2011-10-01T15:56:13.827

@sawdust and MaQleod yes, I reset the ATM packet counter after disconnecting Vonage and it just kept ticking over. Then I looked it up and it seems if you use PPoE mode in DSL it uses ATMs. BTW, no dice after disconnecting Vonage. So it's either the line or the modem. Connection drops are pretty steady averaging one per hour (except that 6 hour window I mentioned earlier). – sumitkm – 2011-10-03T05:30:45.863

@sumitkm - not surprised that the VoIP unit had no effect; if the payload affected the ADSL transceivers, then that's a nasty problem for the manufacturers' to solve. I questioned the ATM stats only because I hadn't seen it before. Turns out that I had never seen it on my ADSL modem because it's on another "Traffic Monitor" page. Note my ISP provides a dynamic IP address rather than PPoE, and there's ATM traffic, so MaQleod's point that ATM use is widespread for ADSL is accurate. – sawdust – 2011-10-05T00:33:52.953

@sumitkm - Besides the line and (your) modem is the possibility of the modem at the CO. Has AT&T done a line test? They can (remotely) command the line unit to perform some tests on the CO side while you turn off and then turn on your ADSL modem. I wonder if you having a naked DSL service w/o POTS (plain old telephone service) is resulting in second-rate treatment. – sawdust – 2011-10-05T00:40:30.073

In my last conversation with ATT they said the last line drop was because they were rebuilding my profile on the router at the CO. (thanks for the discussion here I could partly understand what he was saying). Since then there has been no drops. Thing is they are claiming there is no loss from the CO to my modem when they ping. Which is kind of true if you don't ping continuously for a couple of hours. I asked them to test from NID to CO, and they claim there are no issues there. Currently the above report is showing no DSL Corrected Blocks or DSL Severly Errored Seconds for the last 4+ days. – sumitkm – 2011-10-05T03:24:05.063

Right now I am a little busy to make a trip to their 'Corporate Office' to pickup a new modem. My plan is to get a cheap modem (well cheapest is $75 I guess) from them and see. If things improve fine, else their bluff will be called and I'll return the modem, or I'll know that they sell lousy modems that die after warranty. I'll get a good wireless router separately. I'll update progress here for sure. – sumitkm – 2011-10-05T03:30:06.380