I rent a dedicated server (with Intel Haswell CPU and custom hardware) at a lowcost hosting service and use it with CentOS 6.4 / 64 bit Linux (with stock kernel: 2.6.32-358.14.1.el6.x86_64).
Every few weeks it hangs and the other customers seem to have similar problems.
In the dmesg
output I see (here is the full dmesg output):
CPU0: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz stepping 03
....
NMI watchdog enabled, takes one hw-pmu counter.
....
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.07rh
iTCO_wdt: Found a Lynx Point TCO device (Version=2, TCOBASE=0x1860)
iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
and in the process list I see:
# ps uawwwx|grep [w]atchdog
root 6 0.0 0.0 0 0 ? S Aug22 0:00 [watchdog/0]
root 10 0.0 0.0 0 0 ? S Aug22 0:00 [watchdog/1]
root 14 0.0 0.0 0 0 ? S Aug22 0:00 [watchdog/2]
root 18 0.0 0.0 0 0 ? S Aug22 0:00 [watchdog/3]
root 22 0.0 0.0 0 0 ? S Aug22 0:00 [watchdog/4]
root 26 0.0 0.0 0 0 ? S Aug22 0:00 [watchdog/5]
root 30 0.0 0.0 0 0 ? S Aug22 0:00 [watchdog/6]
root 34 0.0 0.0 0 0 ? S Aug22 0:00 [watchdog/7]
Does this mean, a hardware watchdog is already active at my server and will reboot my machine in under 30 seconds of being frozen?
(In the /etc/sysctl.conf I have put kernel.panic=10
, so that it doesn't stuck in kdb console anymore).
Or do I have to install and start the CentOS package watchdog
?