After a cold boot of a 6.0.8 Debian server (HP ProLiant), ntpd
played havoc with system time: offset and jitter with respect to the usual and reliable reference time servers growing without limit. (Note that a twin identical server had no problem at all.) After many unsuccessful attempts to fix the problem on the ntpd
side I decided to try a reboot, and everything went OK.
In order to investigate the problem I found this discrepancy, which could explain my clock problems:
root@n1:~# zgrep Detected /var/log/dmesg*
/var/log/dmesg:[ 0.004000] Detected 2400.110 MHz processor.
/var/log/dmesg.0:[ 0.004000] Detected 2383.579 MHz processor.
/var/log/dmesg.1.gz:[ 0.004000] Detected 2400.036 MHz processor.
/var/log/dmesg.2.gz:[ 0.004000] Detected 2400.298 MHz processor.
/var/log/dmesg.3.gz:[ 0.004000] Detected 2400.165 MHz processor.
/var/log/dmesg.4.gz:[ 0.004000] Detected 2400.410 MHz processor.
Note that in the second last boot (the problematical one) the detected CPU freq is a clear outlier. Without the outlier, error and standard deviation of the detected frequency with respect to the nominal one is +0.15 MHz ± 0.25 MHz. For the problematic boot I have an error of -16.4 Mhz, which is about 100 times greater than expected.
My questions:
Can an error of this type make the
ntp
time discipline unstable/unusable? Is this the reason for my clock problems?Is this type of behavior a symptom of flacky hardware? Should the server go into hw maintenance?
Update
Some useful data:
- kernel is 2.6.32-5-amd64 (Debian 2.6.32-48squeeze4)
current_clocksource
istsc
- error for
lpj
is (of course) consistent with error on CPU freq
Some context lines for the above grep
[ 0.000000] hpet clockevent registered
[ 0.000000] Fast TSC calibration using PIT
[ 0.004000] Detected 2400.110 MHz processor.
[ 0.000008] Calibrating delay loop (skipped), value calculated using timer frequency.. 4800.22 BogoMIPS (lpj=9600440)