NIC stopped receiving 20% of packets

2

I connect to my gigabit LAN with my RTL8111C-based NIC. It's on a Gigabyte GA-P43-DS3. In the last 3 weeks I noticed that my transfer rate dropped from the usual 40-60 MB/s (using a NAS over SMB) to about 2-10 MB/s. I thought maybe the server did some bandwidth limiting, but from any other PC transfer rates were OK.

I changed the cable of the PC and tried different ports of the switch. When I checked the traffic with Wireshark, I noticed that there were TCP errors, like duplicate ACKs. Later, I checked the transfer rate with iperf and it was just as low as when I copied files over SMB, so the problem must be with the network.

When I did a test over UDP it showed that about 10-20% of the packages were dropped. The funny thing is, that if I use a lower numbered port on my switch, the error rate is much higher. (E.g. port#1: 21% vs. port#8: 11%)

I think the transfer rate dropped because packets were lost during the transfer, so TCP slowed down the process. I also noticed that these error rates only apply to receiving packets. The computer can send packets with nearly 0% losses.

I checked the network with different cables and computers and every combination was working OK. The drop rate was about 1 vs. 460,000 packets (nearly 0%).

Is it common to NICs stop receiving packets? Why does it matter which port I use on my switch?


Update: I use a Linksys SD2008 switch. I tried all of its ports with known-good PC and cables. No matter what combination I used, there was no problem, just with the affected PC. I'm pretty sure that its NIC is failing. I just wonder why does it matter which port I use. The difference between the ports can be measured, but only with the affected PC.

Update 2: I use Windows 7 x64, but I checked with Linux (Fedora 12 x64) and got similar results, so it's unlikely that's a driver issue. The SD2008 ports are divided to 2 4-port groups. The issue is present no matter which group I use. However, I still doesn't understand how is it possible that lower numbered ports have higher error rate...

KovBal

Posted 2010-01-17T18:02:36.013

Reputation: 1 250

It would help to isolate the problem if you clarify a few things - what happens when you connect the affected PC to the same port as an unaffected one with a cable that you know is good? What happens when you connect a known good PC and cable to the affected port(s)? And what sort of switch are you using? Switches and switch ports can fail and often groups of ports share circuitry so failures may happen that affect some ports but not all. Of course it might be your nic too but if you carry out the above checks it will help to isolate the problem. – Helvick – 2010-01-17T18:30:10.553

@Helvick: I updated my question to include checks you mentioned. – KovBal – 2010-01-17T19:36:50.897

Answers

2

The additional diagnostics certainly point to a faulty NIC, specifically since the problem is consistent when you try the same hardware with a different OS but other systems are totally unaffected.

To answer your first question - complete NIC failures are not uncommon (having redundant and even multiply redundant nics in servers is practically mandatory for as well as other reasons) but partial failures are also possible, especially in the Physical layer circuitry and mechanical parts. On a standard (1000BaseT using RJ45 connectors) Gigabit Ethernet nic problems with any of the four pairs of signalling contacts, the analog to digital converter, the filter, the feed forward equalizer, the echo canceler or the clock (and possibly other things) could cause these issues without necessarily leading to the nic completely failing. The result would generally be much poorer signal to noise performance and that will lead to packet loss as the decoder's at each end will get data that clearly has errors.

to answer your second question problems like these that involve the analog circuitry in the network's physical layer can conceivably lead to the variation in error pattern that you are seeing across ports. When everything is working to spec it's quite possible that each port still has significantly different signal-to-noise handling abilities, it's just that all ports will be at least good enough to handle GigE signalling without [significant] errors so that variance is invisible (and irrelevant) but when you add in a sub standard external device the variation can become apparent.

If you want to really test things out try the system with some short and very long cables too and see whether that makes a difference, if the problem is that the nic's Signal-to-noise levels are too low then the error rate should get worse with longer cables.

Helvick

Posted 2010-01-17T18:02:36.013

Reputation: 1 103

I replaced the switch and everything is fine since then. – KovBal – 2010-03-15T15:25:05.497

0

On a home network it shouldn't matter what port you use on the switch (enterprise switches have some more complicated settings but this doesn't sound like your setup).

It's not particularly common for network cards to begin failing but it does happen from time to time. If you've not altered the drivers on your PC then it seems to suggest a hardware issue of some sort, whether an NIC failure or possibly even a powersupply issue

ChrisFletcher

Posted 2010-01-17T18:02:36.013

Reputation: 391