1

I have two VIA NAB-7410 boards with 4x Intel 82541GI gigabit adapters. On both boards, Linux occasionally stops responding to Ethernet frames. The problem goes away when another interrupt is triggered, either by USB or RS232. The boards are running different distros: OpenWRT (3.18.20) and Debian (3.16.0-4-686-pae). Is this an e1000 driver bug?

Update 2015-10-19: I've discovered that the problematic machines are generating ping replies that get queued in a transmit buffer but are not sent out to the network until a USB or serial interrupt. Also, something is putting the CPU to sleep and disabling timer interrupts, as the "uptime" value does not change while the system is unresponsive.

Update 2015-10-26: It seems that the south bridge chip is triggering an SMI routine after periods of no "primary" interrupts, i.e. USB, disk or serial interrupts. See this PDF on page 168, "Legacy Power Management Timers".

How do I disable this "feature"?

peb
  • 31
  • 6
  • 1
    How did you establish that there was a connection between another interrupt and queued packets getting transmitted? Does only specific interrupts trigger transmission of queued packets, or does it happen on any interrupt including timer interrupts? Are there any interrupts shared between multiple peripherals? – kasperd Oct 20 '15 at 07:58
  • The machine is unresponsive to ping until I either send a byte over a serial connection or plug or unplug a USB device. I know the queued packets are getting transmitted because I had been running ping from another machine the whole time, and it received replies to packets it had sent long ago. I will check the interrupt assignment. – peb Oct 20 '15 at 18:57
  • I don't think there are IRQs being shared, but the BIOS isn't exactly clear. I also noticed that the system clock is stopped when the system is unresponsive, so something is putting the processor to sleep (and disabling timer interrupts?). – peb Oct 24 '15 at 07:47
  • The contents of `/proc/interrupts` may provide some hints. There is a few things you could try to see if it affects anything: **1.** Run a ping command on that machine to see if the problem still occurs if it is sending a packet once per second itself. **2.** Run `nice nice sha512sum /dev/urandom` to see if the problem still occurs if the CPU is never allowed to become idle. – kasperd Oct 24 '15 at 09:06
  • [Here's](http://pastebin.com/FPqPzzHX) the contents of /proc/interrupts. Running ping from the machine itself indicates that no packets are sent while the machine is unresponsive. That is, ping does not show "unreachable" events; the sequence number increases by one when the machine wakes up, but the timestamp difference corresponds to the period of sleep. I will try manually keeping the cpu awake. Thanks! – peb Oct 24 '15 at 19:11
  • Keeping the CPU busy seems to prevent the issue, but doesn't seem like a sustainable solution. – peb Oct 25 '15 at 07:58
  • So far I see no evidence indicating the issue is related to the network driver. Knowing that the problem doesn't happen while the CPU is busy could help understanding where to look for the problem. It is not a great workaround, but it could work as a workaround if you keep that process running at the lowest possible scheduling priority and have it spend all its time calling the HLT instruction (that last part would require a kernel modification). – kasperd Oct 25 '15 at 08:10
  • I realized that simply reading from disk once per minute was sufficient. Also, I found a relevant passage in [this PDF](http://www.yuiop.co.uk/epia/stuff/South%20%20Bridge/Data%20Sheet%20VT8237R%20South%20Bridge%20(Revision%202.06)(Lead-Free).pdf) for the south bridge: ".... Therefore the GP0 timer will time out and the SMI routine can put the system into power down mode if no events other than secondary interrupts are happening periodically in the background." Sounds like an ACPI problem? – peb Oct 26 '15 at 17:43
  • I don't think I can help you any further now. If you edit the question to include relevant details from your investigations in your question and not just in comments, then you'll improve the chance somebody else who can answer the question will notice it. – kasperd Oct 26 '15 at 18:05

0 Answers0