7

I have tried several guides on how to set up a local ntp server on ubuntu but none seem to work correctly. My servers are drifting heavily in time for some reason and I have to keep their time close together because I run databases that require this.

  • I have 8 ubuntu 14.04 LTS servers, none of them has internet access
  • I want to run a ntp server on one (or more if that is better) of the servers and have all other servers connect to the ntp server(s) to set the time

Currently, my server (ip .24) runs this /etc/ntp.conf:

server 127.127.1.0 prefer
fudge  127.127.1.0 stratum 10
driftfile /var/lib/ntp/drift
broadcastdelay 0.008

# Give localhost full access rights
restrict 127.0.0.1

# Give machines on our network access to query us
restrict 192.168.178.0 mask 255.255.255.0 nomodify notrap

broadcast 192.168.178.0

And on the "clients":

# Point to our network's master time server
server 192.168.178.24 iburst
fudge 192.168.178.24  stratum 10

restrict default ignore
restrict ::1
restrict 127.0.0.1
restrict 192.168.178.24 mask 255.255.255.255 nomodify notrap noquery

driftfile /var/lib/ntp/drift

minpoll 4
maxpoll 5

Note: I have used Multi-Tabbed Putty to send the following commands to all ntp clients at the same time. I have stopped the ntp services for all except the server, used sudo ntpdate 192.168.178.24 to let them fetch the date and restarted the ntp services afterwards. This succeeded. All servers showed the same date straight after the command finished. After about 10 minutes however, my servers show the following time:

Fr 30. Sep 11:16:53 CEST 2016
Fr 30. Sep 11:15:33 CEST 2016 (server .24) 
Fr 30. Sep 11:16:50 CEST 2016
Fr 30. Sep 11:15:33 CEST 2016
Fr 30. Sep 11:17:05 CEST 2016
Fr 30. Sep 11:15:33 CEST 2016
Fr 30. Sep 11:15:33 CEST 2016
Fr 30. Sep 11:15:33 CEST 2016

How to have them properly sync to the ntp server? And how can I lower the polling time? It looks like my servers are running out of sync fast so I need them to retrieve the "correct" time again...

With "correct" time I mean a time that is the same for all servers. It does not necessarily need to be the exact correct world time (if you call it like that).


Edit: I have tried the suggested configuration setting. As far as I understood, this is how my server/client configs should look like. In the meantime, I have seen that my .24 server is actually drifting to a worse time. The .20 server is the most accurate one and I am using the .20 server now to host the ntp server. Sorry for the confusion.

Server config:

# Use the local clock
server 127.127.1.0 prefer
fudge  127.127.1.0
driftfile /var/lib/ntp/drift
broadcastdelay 0.008

# Give localhost full access rights
restrict default

# Give machines on our network access to query us
restrict 192.168.178.0 mask 255.255.255.0 nomodify notrap

broadcast 192.168.178.0

For the clients:

# Point to our network's master time server
server 192.168.178.20 iburst

restrict default

driftfile /var/lib/ntp/drift

minpoll 4
maxpoll 5

ntpq -as and ntpq -pe on the server:

ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 41906  963a   yes   yes  none  sys.peer    sys_peer  3
  2 41907  8811   yes  none  none    reject    mobilize  1

ntpq -c pe
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        .LOCL.           5 l   60   64  377    0.000    0.000   0.000
 192.168.178.0   .BCST.          16 u    -   64    0    0.000    0.000   0.000

Five times similar output like this (these servers drift in time):

ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 62104  9024   yes   yes  none    reject   reachable  2


ntpq -c pe
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 hadoop20.xx LOCAL(0)         6 u   27   64  377    0.151  63591.8 33407.0

For two (most likely?) working clients:

ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1  7757  963a   yes   yes  none  sys.peer    sys_peer  3

ntpq -c pe
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*hadoop20.xx LOCAL(0)         6 u   18   64  377    0.183    7.883   3.015

edit 2:

I have used sudo service ntp stop, sudo ntpdate 192.168.178.20, wait for ntpdate to finish, sudo service ntp start on all clients. There are still only 2 succeeding clients and 5 rejecting clients.

The rejecting clients show this output. The delay + offset values look high because the failing clients drift in time. Maybe they are not trusting the server to update the time because the delay/offset is so high?

ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 20981  905a   yes   yes  none    reject    sys_peer  5

ntpq -c pe
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 hadoop20.xx LOCAL(0)         6 u   34   64    3    0.166  18665.9 16201.3

I have also tried using this https://askubuntu.com/a/256004 answer, it works for about 30 seconds then the state changes to "reject" again! Same for ntpdate -s 192.168.178.20. It is most likely related to the ntp clients rejecting the time of the server. Is there a way to FORCE them to change the time?

j9dy
  • 173
  • 1
  • 1
  • 5

3 Answers3

15

Don't do this. Seriously. Just don't. People keep coming up with the idea that NTP is designed to allow a bunch of machines all to have the same time. It isn't. It's designed, quite carefully, to allow many machines to all have the closest thing they can to the correct time, which is not the same thing.

If you have access to a window, you can build a half-decent stratum-1 server for about £50, or a good one for £100. You would do much better to build something like that, then point the other clients at it. Correct timestamps are much better than merely self-consistent ones, not least for forensics.

But if you absolutely must do what you're doing, then you need to realise that you're perverting ntpd, and this will mean understanding what you're doing.

On the server

server 127.127.1.0 prefer
fudge  127.127.1.0 stratum 10

means "use the local undisciplined clock as if it were authoritative", which is what you want. I'm not sure why you're forcing it to stratum 10, though; consider dropping the stratum 10, and let the driver supply its default stratum of 0. On the clients

server 192.168.178.24 iburst
fudge 192.168.178.24  stratum 10

makes no sense at all. fudge 127.127.x.y is reserved for forcing the use of various kinds of local clock drivers. It makes no sense to give it any other address. Drop the fudge line from the clients, and just point them at the server. You're also using a closed network, so drop all the security stuff until you get this working:

restrict default

If that still doesn't seem to work, we'll need to see the output of ntpq -c as and ntpq -c pe on both the server, and on a badly-behaving client, after at least ten minutes of uninterrupted running.

Edit: you write in a comment below that "I think the offset/jitter is really high because the failing clients drift in time".

I think you may be right. This chap's blog suggests he had the same experience: that the client clock was so bad that it fooled the local ntpd into thinking that the server was unreliable. He wrote

the reason for the huge jitter finally seems clear! Our clock drifts so fast that the offset will go up by several seconds through our few measurements

Given that it's your clients whose time goes most quickly off which are failing to sync (marking the server as "reject"), I think you're seeing the same effect. His solution was to use adjtimex to manually tune the kernel clock (adjusting the tick value) until the system clock was less wayward, at which point ntpd had a chance to recognise the server as being OK, and sync to it. You should probably give that a try on the worst client first, and see if it helps.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • I've edited my original question. It looks like two clients were able to connect to the server now, but 5 could not. At least that's what I can tell from the output of `ntpq -c as` and `ntpq -c pe` – j9dy Sep 30 '16 at 11:14
  • Looks like it's not a firewall problem, as even the refusing clients can see that the server's at stratum 6. Does `ntpd` syslog anything useful on a refusing client, so we can get some idea of why they're `reject`ing the server? Also, id you do the `ntpdate` first? Plus, the `cnt=2` on the refusing output above is worrying; you did wait ten minutes as asked, yes? – MadHatter Sep 30 '16 at 11:19
  • I have just used the commands again - I think it was 10 minutes already the last time but now it is for sure. For the failing servers, the `cnt=2` remains, it has not changed. I have not restarted meanwhile. I will stop all client ntp services now, use `ntpdate 192.168.178.20` on the clients and then restart the ntp service on the clients. `cat /var/log/syslog | grep ntp` has not given any output for the last hour on the failing clients. Any idea? what about the `minpoll`and `maxpoll` in the client config? – j9dy Sep 30 '16 at 11:30
  • I have added more output to the original question. I think the offset/jitter is really high because the failing clients drift in time. Maybe they do not trust the time of the server? – j9dy Sep 30 '16 at 11:56
  • My feeling is that you really need to get `ntpd` on the client to tell you what's going on. You'll need to check your (r)syslog config, find out where ntpd is logging and why (on my system (CentOS6) it uses facility `daemon` and severities 5, 6, and 7). Also, see my edit above. – MadHatter Sep 30 '16 at 14:58
  • 1
    Following up on your edit: Installing package `adjtimex` solved the problem on its own! The installation printed stuff like: `Comparing clocks (this will take 70 sec)...done. Adjusting system time by -14.5741 sec/day to agree with CMOS clock...done.`. After it finished, `sudo service ntp stop`, `sudo ntpdate 192.168.178.20`, `sudo service ntp start` has solved it! – j9dy Oct 04 '16 at 07:41
0

I was able to get acceptable time diff following the below-listed steps:

Steps

  1. Install chrony in both your devices

    sudo apt install chrony
    
  2. Let's assume the server IP address 192.168.1.87 then client configuration (/etc/chrony/chrony.conf) as follows:

    server 192.168.1.87 iburst

    keyfile /etc/chrony/chrony.keys

    driftfile /var/lib/chrony/chrony.drift

    log tracking measurements statistics

    logdir /var/log/chrony

  3. Server configuration (/etc/chrony/chrony.conf), assume your client IP is 192.168.1.14

    keyfile /etc/chrony/chrony.keys

    driftfile /var/lib/chrony/chrony.drift

    log tracking measurements statistics

    logdir /var/log/chrony

    local stratum 8

    manual

    allow 192.0.0.0/24 allow 192.168.1.14

  4. Restart chrony in both computers

    sudo systemctl stop chrony

    sudo systemctl start chrony

5.1 Checking on the client-side,

sudo systemctl status chrony

 `**output**:

            июн 24 13:26:42 op-desktop systemd[1]: Starting chrony, an NTP client/server...

            июн 24 13:26:42 op-desktop chronyd[9420]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND +ASYNCDNS +IPV6 -DEBUG)

            июн 24 13:26:42 op-desktop chronyd[9420]: Frequency -6.446 +/- 1.678 ppm read from /var/lib/chrony/chrony.drift

            июн 24 13:26:43 op-desktop systemd[1]: Started chrony, an NTP client/server.

            июн 24 13:26:49 op-desktop chronyd[9420]: Selected source 192.168.1.87`

5.1 chronyc tracking output:

        Reference ID    : C0A80157 (192.168.1.87)
        Stratum         : 9
        Ref time (UTC)  : Thu Jun 24 10:50:34 2021
        System time     : 0.000002018 seconds slow of NTP time
        Last offset     : -0.000000115 seconds
        RMS offset      : 0.017948076 seconds
        Frequency       : 5.491 ppm slow
        Residual freq   : +0.000 ppm
        Skew            : 0.726 ppm
        Root delay      : 0.002031475 seconds
        Root dispersion : 0.000664742 seconds
        Update interval : 65.2 seconds
        Leap status     : Normal
GPrathap
  • 271
  • 3
  • 6
-2

You may ditch NTP completely, set time manually on the "server" and issue this command:

ssh root@192.168.178.xxx "date -s \"$(date "+%F %T")\""

Loop it through all you "client" IP's and you are done!

Explanation: local time will be "copied" to remote machine via SSH.