Packet loss on native Windows 7, but not on Mac, Linux, WinXP or VM'd Win7. What?

5

0

I am managing a small handful of newish Dell laptops and desktops that use similar ethernet hardware -- Intel I217-LM for the desktops and Intel I218-LM for the laptops. These all are running the same Intel driver in Windows 7, currently "Intel(R) PROSet Version: 18.1.59.00", or driver version "12.11.77.0, 2/11/2014" (not sure what the two versions are for, but whatever).

These machines are having problems dropping packets to a server a few hops away on our local campus network. My diagnostic tool for these issues has been running a simple ping -t -l 3500 targetserver01 for a few hours at a time, and comparing the number of dropped packets with a control.

What I find is that these new machines are dropping dozens of packets per hour, while an ancient desktop next door drops almost none. The last trial I ran had this old desktop drop 14 packets over 2.5 hours, while all of the newer machines dropped between 110 and 130 over the same period of time. Even running the same laptops on wifi has them drop fewer packets than when they are using ethernet. I've also controlled for network infrastructure -- I am 100% sure (+/- 10%) at this point that the variable coincident with this issue is the Intel ethernet driver on Windows, and this is proved by booting one of the laptops into Ubuntu on a USB stick. When running the default Ubuntu driver on the same exact Intel chipset, the issue disappears and packet loss rates are back in line with the "old desktop" control.

I've tried playing with all the settings I can get my hands on in the driver settings in Device Manager, but to no avail. These machines are required to run Windows software, so I can't just install Linux on all of them. The best workaround I have at this point is to buy USB-Ethernet adapters for all of these machines to use instead of the built-in interfaces, but I figure there's got to be a better way since the issue is with the driver software, not the interface itself.

I found this question, which seems to indicate I'm not going to find "generic" drivers for this Intel chipset:

Generic Ethernet drivers for WINDOWS

So, what are my options? Does this warrant further research, perhaps by running WireShark on the affected machines? Does Intel take user feedback?

EDIT:

The new drivers haven't made any difference. All of the Windows 7 machines I have access to at the moment are the affected, with the exception of a virtual machine running on an iMac. The virtual machine does NOT experience the ~1% packet loss issue.

Next steps are to simultaneously test ping every router between this office and the server (there are only 2), read up on how to use NTttcp, and find another native Windows 7 machine that uses a different chipset. I'll report back.

Oh, also I did a trial with a USB-Ethernet dongle and got the same approximate 1% packet loss. So now I'm just weirded out.

New question: How can I keep myself from slowly descending into madness?


2nd Edit:

This is finally starting to look like a network issue after all. Still haven't had the chance to explore some of the suggested analysis tools (but thank you for that), but performance on previously unaffected machines has started to degrade over the past 24 hours -- and now my testing has narrowed this down to anything past the first switch in the route -- pings to a machine on the same switch and in the same office succeed with 0% packet loss.

NReilingh

Posted 2014-06-09T15:30:05.173

Reputation: 5 539

Why are you dropping 0.1% - 1.5% packets over cable in first place? I just made a stress test and lost 0 packets out of 10000, while you lost 14 in your "good" attempt. – LatinSuD – 2014-06-12T17:22:21.723

I lack the networking chops to answer that question -- my network admin says the switches are not reporting any errors. I also agree that my method of testing -- pinging an application server -- is somewhat barbaric, but it's the >1% packet loss that I'm worried about here, not the .1%. – NReilingh – 2014-06-12T17:26:10.630

Some instruction on a more refined manner of testing these things would be appreciated as well. – NReilingh – 2014-06-12T17:28:17.127

You need to use a different ping tool which allows more than 1 pps. I personally use cygwin's ping or linux's ping, but you may try psping (i haven't tried personally). – LatinSuD – 2014-06-12T17:33:10.933

Interestingly, I can run a 10000-packet flood ping to this same server over the same network, and get 0 packet loss completed in a few seconds -- but this is from an iMac. Perhaps the 14 packets lost from the old desktop are short losses of connectivity that happened over time. Still doesn't explain the drastically higher loss rate on this newer Dell hardware. – NReilingh – 2014-06-12T17:43:55.287

Have you also tested the connection between those new computers as well i.e. on the local network? I would also suggest using a more proper testing tool, perhaps NTttcp which even Dell recommends.

– Cristian Ciupitu – 2014-06-13T11:20:40.857

1

There is a newer driver available at Intel since April : Version 19.1, might give it a try. Also have a look in the Event Viewer for computers that dropped frames.

– harrymc – 2014-06-13T13:05:33.057

Why 3500 bytes? Does it fail with the default size? If not, the problem could be related to jumbo frames. The I217/218 adapters support jumbo frames whereas your old desktops probably do not. – Jason – 2014-06-13T22:52:53.340

@Jason Ooh, this is interesting. I was using 3500 instead of the default at the suggestion of our networking person, simply to stress the interface a bit more. Failures were still happening with the default size, but I didn't realize that was over the jumbo-frame threshold. – NReilingh – 2014-06-13T22:55:08.987

Answers

2

If the network guys do not see any errors on their switch, you might want to look at network errors on your own port... Unfortunately, in a race to have everything "User Friendly" Microsoft has buried that info so deeply that you need to use Regedit to see something.

Here's what I've found to enable Ethernet error reporting in windows 7:

Show errors in addition to the amount of sent/recieved packets.

As you probably know, the Network Connection Status window shows the number of packets received and sent via the adapter. However, you can perform a registry hack to show the amount of errors as well, which can help alert you of network problems. To do this, add the following key and value:

Hive: HKEY_LOCAL_MACHINE

Key: SYSTEM\CurrentControlSet\Control\Network\Connections\StatMon

Value Name: ShowLanErrors

DWORD Value: 0 = default, 1 = enable error count

Remi Letourneau

Posted 2014-06-09T15:30:05.173

Reputation: 361

Hm, this is supposed to modify the "Local Area Connection Status" window that displays "Connection" and "Activity" statistics under a single "General" tab, yes? I had to add a key for StatMon under Connections, and then under StatMon create a ShowLanErrors DWORD. Does this require a restart? – NReilingh – 2014-06-13T01:06:29.043

Restart didn't help--haven't gotten this to work, unless I'm misunderstanding something. – NReilingh – 2014-06-13T01:18:49.647

1

Did you install the intel drivers on the linux system? Because if you did then what version? Because when i googled Intel I218-LM i found new drivers that were released in april. So you might want to look into that. (On the intel site) https://downloadcenter.intel.com/SearchResult.aspx?lang=eng&ProductFamily=Ethernet+Components&ProductLine=Ethernet+Controllers&ProductProduct=Intel%C2%AE+Ethernet+Connection+I218-LM

If not, then you should try that out to get more info.

Can we have more information one what Dell laptops and desktops you're using? Or just more information on the setup in general... (I recommend taking the network drivers from the dell website that is specific to your desktop and laptop as opposed to going to intel...)

Just to get more info on the problem, i recommend bringing in a windows 7 computer that you know doesn't have any issues at all, and connecting it to the network.

EDIT: And yes, intel does have a support centre, you can contact them, and it would appear that they have a forum too (Not too sure about the last one)

user3407161

Posted 2014-06-09T15:30:05.173

Reputation: 89

I guess I should know better than to trust the "Update Driver..." automatic function. I'll check the drivers available directly from Dell later. – NReilingh – 2014-06-13T23:02:30.460

One doesn't normally install third-party drivers on linux. In most cases the distributed kernel either supports the devices or they are not compatible at all. Rarely, additional drivers can be compiled for the system, but even-so, you would not install a windows network device driver into a linux system. – Ben West – 2014-06-17T01:45:31.970

@BenWest The page linked in this answer does actually have Intel-provided drivers to be installed on Linux. I think it would be an interesting test to see if packet loss arises after installing these drivers on a platform not experiencing the issue. – NReilingh – 2014-06-17T16:05:11.223

Yes, well, I agree. It would be interesting to see if the provided driver causes the packet loss. – Ben West – 2014-06-17T17:04:29.130

1

There is a newer driver available at Intel since April : Version 19.1, so you could give it a try.

Also have a look in the Event Viewer for the computers that dropped frames.

harrymc

Posted 2014-06-09T15:30:05.173

Reputation: 306 093

Annoyingly, the "Update Driver..." button in the adapter properties didn't pick this up--I'm testing it now. – NReilingh – 2014-06-13T22:29:22.877

It should be executed to install. – harrymc – 2014-06-14T05:29:13.683

What's "It"? I was just complaining that the update-checking button said it was up-to-date. – NReilingh – 2014-06-16T23:19:38.297

What would I be looking for in Event Viewer? – NReilingh – 2014-06-17T01:35:27.247

"It" is the executable that is loaded : PROWin32.exe. – harrymc – 2014-06-17T06:43:56.877

0

Since your problem is only with Windows 7, despite the fact that Win7 VMs are working properly, lets ignore it for now.
I will try to look somewhere else, as based on my experience the problem most of the time hide behind something else your will never think about it :)
Windows 7 has an option called Auto Tuning "or something similar" it tries to optimize the network traffic to achieve the best possible transfer rate over the network, but as always Microsoft most of the time make things worse, i had a lot of troubles on my network because of it.
To disable it run this command from an admin account:

netsh interface tcp set global autotuninglevel=disabled


Then reboot for settings to take effect.
I had another similar problem because of my third party firewall, it was working well on XP, but not on 7, i had to upgrade to a newer version.

Think out of the box and look for other solutions other than your driver!

ITProStuff

Posted 2014-06-09T15:30:05.173

Reputation: 389

0

Shut off all those annoying power management settings or completely disable IPV6 on your network interface. Either one will solve most your issues with this network card.

http://en.community.dell.com/support-forums/desktop/f/3514/t/19523337.aspx

TravK

Posted 2014-06-09T15:30:05.173

Reputation: 1