I just noticed by pure chance that one of my Cisco 4500 switches has its clock going wrong: it is more than 2 minutes behind in spite of seemingly functional ntp. In my opinion, even a single second should not be considered acceptable for the systems involved. Also, I wouldn't have noticed the difference from diagnostics, had I not compared it to a simple wall-clock.
Some details
Here's ntp information for some of my hosts (10.0.99.1, 10.0.99.2, 10.0.1.119, 10.0.99.241) that are partly referencing one another for fallback, but mainly should all ultimately by syncing with 10.0.0.1, which again pulls the time from outside. So the time discrepancy cannot result from different original time sources. As the observations made me somewhat paranoid, "has correct time" in the following means: show clock
(or date
) produced an output that matches my wall-clock and my local system clock (which is fine according to http://time.is) with an error certainly below 1 seconds (accuracy of me hitting ENTER while watching my local clock)
10.0.1.119 (Ubuntu) has correct time
$ ntpq -np
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.0.99.1 10.0.0.1 3 u 855 1024 377 0.904 -2.658 0.113
*10.0.0.1 130.149.17.8 2 u 266 1024 377 0.253 0.909 0.127
10.0.99.241 (Cisco 2960) has correct time
#sho ntp associations
address ref clock st when poll reach delay offset disp
*~10.0.99.1 10.0.0.1 3 28 64 377 1.462 85.288 19.758
+~10.0.99.2 10.0.1.119 4 29 64 377 1.297 83.515 5.369
* sys.peer, # selected, + candidate, - outlyer, x falseticker, ~ configured
10.0.99.2 (Cico 4500) has correct time
#sho ntp associations
address ref clock st when poll reach delay offset disp
+~10.0.99.1 10.0.0.1 3 6 1024 111 1.148 -1.618 42.875
*~10.0.1.119 10.0.0.1 3 31 1024 377 0.043 1.687 1.064
* sys.peer, # selected, + candidate, - outlyer, x falseticker, ~ configured
10.0.99.1 (Cisco 4500) lags behind by about 2 minutes 6 seconds
#sho ntp associations
address ref clock st when poll reach delay offset disp
*~10.0.0.1 130.149.17.8 2 274 1024 377 15.625 3.681 30.403
+~10.0.99.2 10.0.1.119 4 415 1024 376 15.625 0.855 33.276
* sys.peer, # selected, + candidate, - outlyer, x falseticker, ~ configured
#sho ntp status
Clock is synchronized, stratum 3, reference is 10.0.0.1
nominal freq is 250.0000 Hz, actual freq is 249.9988 Hz, precision is 2**6
reference time is DAD8B428.54C6BAEA (20:36:24.331 MESZ Sat May 7 2016)
clock offset is 3.6818 msec, root delay is 32.80 msec
root dispersion is 71.74 msec, peer dispersion is 30.40 msec
loopfilter state is 'CTRL' (Normal Controlled Loop), drift is 0.000004720 s/s
system poll interval is 1024, last update was 683 sec ago.
Questions
- How come 10.0.99.1 is so far off?
- How come systems that sync to 10.0.99.1 are correct?
- How should I learn from the output of
sho ntp status
on 10.0.99.1 that the clock is actually totally out of sync (compared to all hosts and reference clocks mentioned insho ntp asso
)? For me the output looks totally like a very elaborate "I am totally happy".
EDIT: By popular demand, the output of sho clock detail
10.0.99.1
#sho clock detail
13:06:38.605 MESZ Tue May 10 2016
Time source is NTP
Summer time starts 02:00:00 MEZ Sun Mar 27 2016
Summer time ends 03:00:00 MESZ Sun Oct 30 2016
10.0.99.2
#sho clock detail
13:10:54.083 MESZ Tue May 10 2016
Time source is NTP
Summer time starts 02:00:00 MEZ Sun Mar 27 2016
Summer time ends 03:00:00 MESZ Sun Oct 30 2016