Ubuntu Server 9.04 computer freezes the entire network periodically

2

1

I've got a headless computer running Ubuntu Server 9.04 which I use for file sharing on the network and as a private web server. Every now and then, I lose my internet connection on my laptop, which is on the same network. When I check, the entire network is down and none of the connected devices can be reached. However, if I unplug (and thus reboot) the Ubuntu Server, the entire network resumes operation like nothing ever happened. It happens once or twice every two months.

I've been looking at the syslogs, and there's nothing to see there. The syslog stops at 20.45 all of the sudden and then resumes at 21.15 with the kernel boot messages, at the moment I pressed the on-switch to boot the computer.

BloodPhilia

Posted 2010-09-02T19:25:47.897

Reputation: 27 374

Answers

3

I've seen NICs lose their minds and jam cheap switches by jabbering (transmitting an endless frame) or by sending excessive low-level Ethernet flow control signals. Unfortunately, these kinds of MAC/PHY hardware bugs can go unnoticed by the host's Ethernet driver, so you won't see anything in your logs. Also unfortunate is the fact that neither of those flaws would be visible in a typical sniffer trace, as the flow control signals aren't really "Ethernet frames", and as for jabbering, sniffers generally only capture frames that are within normal size limits.

Next time this happens, it would be interesting to see if the problem goes away simply by disconnecting the Ubuntu server's Ethernet link to the network, and if it does, does the problem come back when you plug the Ethernet cable back in?

Dropping link might be enough to reset the NIC's chips to resolve the problem, but if the problem does come back as soon as you plug the Ethernet back in, you could try plugging the Ethernet cable from the Ubuntu server directly into the Ethernet on a sniffer machine (hopefully you have a machine with auto-MDI-X or a crossover cable handy). Then you can try capturing frames; if you are able to capture frames, they might give you a clue as to where the bug is with the NIC, driver, networking stack, or some network-using application.

You could also Google for other people with the same kind of NIC (or at least NIC chipset) as you, to see if others are having the same problem. Of course it's always good to make sure you have the latest driver for your card.

Does your headless Ubuntu server have a graphics card in it at all, or can you put one in temporarily? Then the next time it happens, you could plug in a display, keyboard and mouse and see what you can learn in situ on the host. Is the host kernel panicked or completely frozen, or is it just that its network I/O is hosed? If the host is basically usable (except the network), then you could run tcpdump or Wireshark on it and see what it thinks it's doing over the network.

Note that you don't even have to use a graphical console for checking out what's going on on the Ubuntu server box. For example, if your machine has a serial port (or you can hook up a USB-serial adaptor) that you can configure as a serial console terminal port, you could hook another machine to that port and poke around from the shell. Or if you have another NIC you could put in that box, you could have the other NIC go to a separate isolated network that you can use to SSH or VNC into the box (going on the theory that it's just the one NIC that's lost its mind, not the whole Linux network stack).

I'd suggest installing a higher-quality NIC in your server, or replacing the likely consumer-grade switch you're using at home with something enterprise-grade enough to be designed to partition off ports that are hosing the network.

Update: Added some additional suggestions for diagnosing/troubleshooting. But overall, if it's the kind of NIC hardware failure modes I'm thinking of, I doubt anyone would have much hope of debugging this other than the engineers that designed that NIC chipset.

Spiff

Posted 2010-09-02T19:25:47.897

Reputation: 84 656

Thanks for your quick reply! Is there no way to diagnose what triggers this? – BloodPhilia – 2010-09-02T20:07:02.037

I just added some more ideas for diagnosing it, but if it's the kind of NIC hardware failure I suspect, the only people who can really diagnose it are the NIC chipset engineers who created it. – Spiff – 2010-09-02T20:31:53.570

Thanks for all the input, I'll check out all the options but from your answer, I conclude that my NIC might be faulty. Or am I concluding wrong? +1 for now for the effort and accept when I get it resolved! Thanks! – BloodPhilia – 2010-09-02T20:36:45.493

I wouldn't call it a "conclusion" yet. It's just the best working hypothesis I can offer you based on your description and my experience. I should also mention that I've seen perfectly good NICs lose their minds when the host system panics or freezes, so your ultimate root cause may be something else that's causing a panic or freeze, and the NIC fouling the network could just be an unfortunate second-order effect. – Spiff – 2010-09-02T21:11:49.873