2

I have a linux box, set up with two ntp servers to sync. This box in case, was extremely out of sync (61 seconds) before it was forced sync. The following outputs are 1 hour after this sync. When checking the ntpq,

ntpq> peers                                                                           
          remote           refid      st t when poll reach   delay   offset  jitter     
==============================================================================     
x192.168.[redacted]   .MDM.            1 u  113  256  377    0.513   13.120   1.843     
x192.168.[redacted]   .MDM.            1 u  115  128  377    2.689    0.618   1.230     

Both are set to falsetickers!

ntpq> assoc                                                 

ind assID status  conf reach auth condition  last_event cnt 
=========================================================== 
  1 13191  91d4   yes   yes  none falsetick   reachable 13  
  2 13192  91d4   yes   yes  none falsetick   reachable 13  

What has led the time choosing algorithm to set both as false, and how can I fix it?


UPDATE!

I have rerun the commands above and got new status:

ntpq> assoc                                                                     

ind assID status  conf reach auth condition  last_event cnt                     
===========================================================                     
  1 13191  91d4   yes   yes  none falsetick   reachable 13                      
  2 13192  96d4   yes   yes  none  sys.peer   reachable 13                      
ntpq> pe                                                                        
     remote           refid      st t when poll reach   delay   offset  jitter  
==============================================================================  
x192.168.[red]   .MDM.            1 u  241  256  377    0.513   13.120   1.396  
*192.168.[red]   .MDM.            1 u  114  256  377    2.671    0.567   0.710  
MadHatter
  • 78,442
  • 20
  • 178
  • 229
kurast
  • 123
  • 1
  • 1
  • 6
  • Are your two stratum-1 servers *really* stratum-1 servers? The fact that the client's offsets from them differ by 13ms with only a 2ms propagation delay makes me think not. – MadHatter Apr 25 '14 at 13:04
  • They are not stratum-1. They are at best a 2, or maybe a 3. My ntp.conf file does not set stratum for them. – kurast Apr 25 '14 at 13:13
  • Not according to the `ntpq` output above. Something is clearly up with your servers, which is probably not helping `ntpd` take them seriously. (And by the way, setting the stratum is not generally `ntp.conf`'s job, save for stratum-1 servers who will be told about their directly-attached stratum-0 source.) – MadHatter Apr 25 '14 at 13:14
  • I have provided more timely results above. – kurast Apr 25 '14 at 13:16
  • These upstream servers are **not** stratum-2 or 3. What makes you think they are? Do you know anything about them at all? – MadHatter Apr 25 '14 at 13:19
  • From what I know, they should be linked to a national stratum-1 server, but I am not responsible for them, nor able to check if that is true. Who decides the stratum of the server? The server itself, or some calculation made on my side? – kurast Apr 25 '14 at 13:21
  • How far away you are connected to a Stratum-0 server determines your Stratum level. The servers connecting to the Stratum-0 server is a Stratum-1 server. The servers connected to the Stratum-1 server would be a stratum-2 server. – Rex Apr 26 '14 at 03:01
  • Not quite: there are no stratum-0 servers, only stratum-0 sources, which are absolute reference clocks that do not themselves speak NTP but are directly connected to a server who *does*. That latter is a stratum-1 server. Apart from that, your point is precise and correct. – MadHatter Apr 26 '14 at 05:20

2 Answers2

5

Your two upstream servers both claim to be stratum-1 servers - that is, the highest class of time source that is able to speak NTP, one to which an absolute time source (such as an atomic clock, or a GPS receiver) is directly attached - but their clocks are different from each other (that is, your offset from each server (how far away your clock is from its, when you receive its signal) is much more than the observed propagation delay (how long it takes to get a time signal from each server)).

Faced with two servers who both claim to be authoritative but are telling different times, ntpd is quite reasonably saying that it can't decide between them and it will regard them both as charlatans.

It now looks like, left to itself, ntpd has decided after an hour that it prefers one to the other, and agreed to sync to it. Good for it.

The basic problem here is that the upstreams are between them saying something which cannot possibly be true. If you only want a rough time, list only one of them in your ntp.conf, and you'll sync to that much more quickly. If you want an exact time, contact the admins of the servers, and ask them why their clocks differ, and where each of them is getting its time source.

Edit: if I were to guess, I'd say that both of them are wrong - my guess is they've both been configured to treat their internal clocks, or some similarly insufficiently-accurate time source, as stratum-0. They may also have been configured to take time from internet servers, but since they've been told that they have an absolutely-accurate clock attached, they're preferring that time, and advertising as stratum-1 in consequence.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • If you were to guess, would you say one of them is wrong, or that one is higher stratum than the other? – kurast Apr 25 '14 at 13:42
  • 2
    @MadHatter there is no evidence or rational to justify the claim that both are wrong, only one of them is clearly wrong. This is a great example of the maxim: "A man with one watch knows what time it is. A man with two watches is never sure." – dfc Apr 25 '14 at 15:11
  • Kurast asked me for a guess. It's an informed guess: the two upstreams are identical, down to the upstream refid, which means physical class of time source, and at least one is clearly wrong. If one of two identical things is wrong, it's quite likely that both of them are, as being right is costly, and it's clear that these things haven't paid the price. But it's still just a guess. He won't know for sure without talking to the admins. – MadHatter Apr 25 '14 at 15:19
  • My S1s report a refid of DFC, regardless of GPS or PPS: `fudge refid DFC` – dfc Apr 26 '14 at 01:37
3

A man with one watch knows what time it is. A man with two watches is never sure.

You need to add another server so that ntpd can break the tie between two clocks. Of all the possible numbers of server associations, two clocks is the worst setup. It does not matter if third server is stratum 2 or stratum 3, you just need to give ntpd a chance to discern who is the falseticker.

PS

You do not need to redact your RFC1918 addresses. In fact it makes it harder to answer when you redact them like this. It would be better if you switched which octets you redacted: xxx.xxx.1.1 and xxx.xxx.1.2. Atleast that way it is easy to refer to one or the other. But most importantly there is really no need to redact 1918 addresses.

dfc
  • 1,331
  • 8
  • 16