50

I have a Dell 1U Server with Intel(R) Xeon(R) CPU L5420 @ 2.50GHz, 8 cores running Ubuntu Server Kernel Version 3.13.0-32-generic on x86_64. It has dual 1000baseT networking cards. I have it set up to forward packets from eth0 to eth1.

I have noticed that in my kern.log file it keeps hanging then resting. This is happening often. This happens every few second then maybe it will be ok for a few minutes then back to every few seconds.

Here is the log file dump:

 [118943.768245] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
 [118943.768245]   TDH                  <45>
 [118943.768245]   TDT                  <50>
 [118943.768245]   next_to_use          <50>
 [118943.768245]   next_to_clean        <43>
 [118943.768245] buffer_info[next_to_clean]:
 [118943.768245]   time_stamp           <101c48d04>
 [118943.768245]   next_to_watch        <45>
 [118943.768245]   jiffies              <101c4970f>
 [118943.768245]   next_to_watch.status <0>
 [118943.768245] MAC Status             <80283>
 [118943.768245] PHY Status             <792d>
 [118943.768245] PHY 1000BASE-T Status  <7800>
 [118943.768245] PHY Extended Status    <3000>
 [118943.768245] PCI Status             <10>
 [118944.780015] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly

Here is the info from ethtool:

Settings:

Settings for eth0:

Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
               drv probe link
Link detected: yes

Driver info:

ethtool -i eth0

driver: e1000e
version: 2.3.2-k
firmware-version: 1.4-0
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

What could be causing this? Is this just a bug in the software or a actual hardware issue? I have seen many other having similar issues but no real solution and this also leads me to believe that its a software issue?

Maybe someone can shed some light on this for me?

Kyle Coots
  • 2,175
  • 3
  • 16
  • 14

6 Answers6

42

Ok so after posting this question last night night I continued to do some research the only real solution I came across seems to have taken care of the problem.

Disabling TSO, GSO and GRO using ethtool:

ethtool -K eth0 gso off gro off tso off

According to a post found here: http://ehc.ac/p/e1000/bugs/378/

From what I understand this will or can cause a reduction in performance.

I also noticed another solution was to disable Active-State Power Management

pcie_aspm=off

According to this post on serverfault: Linux e1000e (Intel networking driver) problems galore, where do I start?

I haven’t tried this solution yet. I will try it and see if that makes a difference and post back my findings.

EDIT:

Ok so I have tried turning off Active-State Power Management, pcie_aspm=off and this didn't have any effect. I continued to notice errors in my log file.

This may still work for some as some of the Intel nics have issues with different kernels of falling asleep when power management is enabled.

cdhowie
  • 362
  • 1
  • 8
Kyle Coots
  • 2,175
  • 3
  • 16
  • 14
  • 2
    Thanks! I tried the ethtool fix, and it solved my issue. (also stuck it in an init script) – Peter Feb 16 '15 at 11:46
  • Hi, do you know if running `ethtool -K eth0 gso off gro off tso off` will drop the connection, even for a short time? – godzillante Oct 21 '16 at 06:48
  • Indeed, disabling options with ethtool helped, disabling power management options didn't – Oleg Gryb Jan 18 '18 at 18:47
  • 2
    'According to a post found here: http://ehc.ac/p/e1000/bugs/378/' above now goes to a domainsquatter, original content can be found here: https://web.archive.org/web/20160205153351/http://ehc.ac:80/p/e1000/bugs/378/ – Mike McCabe May 15 '18 at 00:47
  • 1
    @godzillante for future reference: It can drop the connection for a couple of seconds, however clients will not be disconnected unless they timeout depending on your application. – Luc H Mar 18 '21 at 17:03
  • no downtime noticed too – laimison Nov 16 '21 at 00:39
  • Intel NUC BOXNUC8i7BEH2 sudo ethtool -K eno1 tso off gso off – user249654 Feb 13 '22 at 20:58
9

Disabling Enhanced C1 (C1E) in the BIOS fixed it for me.

Not sure if the lower power state of C1E is messing with the driver, or that there's an oops in the driver when the processor is in this state.

Anyway, problem solved.

SteveG
  • 91
  • 1
  • 1
  • This was exactly the fix that worked for me. Running Ubuntu 16.04 LTS on a ASRock H170M-ITX/DL motherboard. Thanks SteveG. =) – Tails May 10 '16 at 04:26
  • mind that this may increase the servers power consumption a lot! – Flatron Aug 16 '18 at 04:04
7

Disabling only TCP Segmentation Offload (TSO) does the trick for me.

ethtool -K eth0 tso off

Note: It does not seem to be necessary to also disable Generic Receive Offload (GRO) and Generic Segmentation Offload (GSO), as it is recommended by various sources. As far as I learned, these are implemented purely in software, and should be safe. Don't sacrifice more performance than necessary.

David Scherfgen
  • 265
  • 2
  • 6
2

I had the issue (triggering same kernel error as you and userspace SSH errors like "Corrupted MAC on input").

Solution

What worked for me was to disable TCP checksum offloading :

# ethtool -K eth0 tx off rx off

Clean & long-term integration of this with debian-ish /etc/network/interfaces:

#!/bin/bash
#
# Disables TCP offloading on all ifaces
#
# Inspired by: @Michelunik https://serverfault.com/a/422554/62953

RUN=true
case "${IF_NO_TOE,,}" in
    no|off|false|disable|disabled)
        RUN=false
    ;;
esac


# Other offloading options that could be disabled (not TCP related):
#  sg tso ufo gso gro lro rxvlan txvlan rxhash
# see man ethtool

if [ "$MODE" = start -a "$RUN" = true ]; then
  TOE_OPTIONS="rx tx"
  for TOE_OPTION in $TOE_OPTIONS; do
    /sbin/ethtool --offload "$IFACE" "$TOE_OPTION" off &>/dev/null || true
  done
fi

source, inspiration.

Context

  • Debian Jessie
  • Kernel 4.7.0-0.bpo.1-amd64
  • lspci 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-V (rev 04)
  • Worked for me with on CentOS 7, Kernel 3.10.0-1160.11.1.el7.x86_64, Device: 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31) – Craig Jan 24 '21 at 14:26
0

I just stumbled upon this readme from intel:

https://downloadmirror.intel.com/15817/eng/readme.txt

which says

82573(V/L/E) TX Unit Hang Messages

Several adapters with the 82573 chipset display "TX unit hang" messages during normal operation with the e1000edriver. The issue appears both with TSO enabled and disabled and is caused by a power management function that is enabled in the EEPROM. Early releases of the chipsets to vendors had the EEPROM bit that enabled the feature. After the issue was discovered newer adapters were released with the feature disabled in the EEPROM.

If you encounter the problem in an adapter, and the chipset is an 82573-based one, you can verify that your adapter needs the fix by using ethtool:

ethtool -e eth0

Offset Values


0x0000 00 12 34 56 fe dc 30 0d 46 f7 f4 00 ff ff ff ff

0x0010 ff ff ff ff 6b 02 8c 10 d9 15 8c 10 86 80 de 83

The value at offset 0x001e (de) has bit 0 unset. This enables the problematic power saving feature. In this case, the EEPROM needs to read "df" at offset 0x001e.

Unfortunately my problematic adapters are 82579V and I219-V in two different NUCs, so it's unclear if the same fix applies for me.

janfrode
  • 11
  • 3
-1

Try update your driver. Don't know where it is for Ubuntu or what version recommended but for CentOS or EL 6 it is:

http://mirror.symnds.com/distributions/elrepo/elrepo/el6/x86_64/RPMS/kmod-e1000e-3.1.0.2-1.el6.elrepo.x86_64.rpm

Fred Flint
  • 618
  • 7
  • 6