We recently had a little problem with networking where multiple servers would intermittently lose network connectivity in a fairly painful-to-resolve way (required hard reboot). This has been going on for about two weeks, seemingly at random, on different servers. No particular pattern that we could discern to it.
After some digging into it, we saw that the switch was reporting 100 Mbps for the problem port:
This sounds remarkably like what happened in the Joel Spolsky article Five Whys
Michael spent some time doing a post-mortem, and discovered that the problem was a simple configuration problem on the switch. There are several possible speeds that a switch can use to communicate (10, 100, or 1000 megabits/second). You can either set the speed manually, or you can let the switch automatically negotiate the highest speed that both sides can work with. The switch that failed had been set to autonegotiate. This usually works, but not always, and on the morning of January 10th, it didn’t.
We have now disabled auto-negotiate on our network hardware and set it to a fixed rate of 1000 Mbps (gigabit).
My questions to those with more server hardware networking expertise:
- How common are auto-negotiate problems with modern networking hardware?
- Is it considered good, standard networking practice to disable auto-negotiate and set fixed speeds when setting up networking?