5

This is in RHEL 5.5.

First, ntpdate to the remote host works:

$ ntpdate XXX.YYY.4.21
24 Oct 16:01:17 ntpdate[5276]: adjust time server XXX.YYY.4.21 offset 0.027291 sec

Second, here are the server lines in my /etc/ntp.conf. All restrict lines have been commented out for troubleshooting.

server 127.127.1.0
server XXX.YYY.4.21

I execute service ntpd start and check with ntpq:

$ ntpq
ntpq> peer
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        .LOCL.           5 l   36   64  377    0.000    0.000   0.001
 timeserver.doma .LOCL.           1 u   39  128  377    0.489   51.261  58.975

ntpq> opeer
 remote           local          st t when poll reach   delay   offset    disp
==============================================================================
*LOCAL(0)        127.0.0.1        5 l   40   64  377    0.000    0.000   0.001
 timeserver.doma XXX.YYY.22.169   1 u   43  128  377    0.489   51.261  58.975

XXX.YYY.22.169 is the address of the host I'm working on. A reverse lookup on the IP address in my ntp.conf file validates that the ntpq output is correctly naming the remote server. However, as you can see, it appears to just roll over to my .LOCL. time server. Also, ntptrace just returns the local time server, and ntptrace XXX.YYY.4.21 times out.

$ ntptrace
localhost.localdomain: stratum 6, offset 0.000000, synch distance 0.948181

$ ntptrace XXX.YYY.4.21
XXX.YYY.4.21: timed out, nothing received
***Request timed out

This looks like my ntp daemon is just querying itself.

I am thinking about the possibility that the router-I-don't-control between my test network timeserver and the corporate network timeserver is blocking on source port. (I think ntpdate sends on port 123, which gets it around that filter and is why I can't use it while ntpd is running.) I have email in to the network folks to check that.

Finally, telnet XXX.YYY.4.21 123 never times out or completes a connection.

The questions:

What am I missing, here?

What else can I check to try to figure out where this connection is failing?

Would strace ntptrace XXX.YYY.4.21 show me the source port ntptrace is sending from? I can deconstruct most strace calls, but I can't figure out the location of that datum.

If I can't directly examine the gateway router between my test network and the timeserver, how might I build evidence that it's responsible for these disconnections? Alternately, how might I rule it out?

Martin Schröder
  • 315
  • 1
  • 5
  • 24
dafydd
  • 395
  • 2
  • 3
  • 10
  • I think everything's fine. You just need to wait a while for the server to synch. Because your clock isn't stable yet, other people's clocks don't seem stable to you, because you're measuring them against yours. It takes awhile (hours at least) for this to settle down. – David Schwartz Oct 25 '12 at 03:15
  • Hi, David. I haven't seen a change in 24 hours. I'll try the proposed answers tomorrow and report back. – dafydd Oct 26 '12 at 02:03

5 Answers5

4

The 377 in the reach column means that connectivity is ok; telnet won't connect because NTP is UDP.

Try removing the server 127.127.1.0 from your config - the * by *LOCAL(0) tells us that the local server with stratum 5 is being used for sync, preferred over the remote server with stratum 1; the delay and offset both being 0.000 likely has a lot to do with that.

Shane Madden
  • 112,982
  • 12
  • 174
  • 248
  • I would believe the link was okay except for `ntptrace XXX.YYY.2.21` failing. I think I should be seeing something there. I'll try removing the local server tomorrow and report on what I get. Thanks! – dafydd Oct 26 '12 at 02:04
  • When I have the `127.127.1.0` server available, the syslog will show me `synchronized to LOCAL(0), stratum 5`. After removing that line and waiting 20 minutes, I still don't get any synchronization to the remote server. – dafydd Oct 26 '12 at 17:08
  • @dafydd Did you restart the `ntpd` service after changing the config? What's the output from `ntpq -pn` now? – Shane Madden Oct 26 '12 at 17:14
  • Yes, every time. And, I watch the syslog for a `synchronized` message before I start querying `ntpq`. – dafydd Oct 26 '12 at 18:21
  • Right, but what is it showing when queried when the sync message hasn't occurred? – Shane Madden Oct 26 '12 at 18:22
  • I'll get responses for both servers, but the `REFID` is still `.LOCL.` for both hosts, and both hosts have spaces for their `ntpq` tally codes. Usually, the LOCAL(0) host will tally as `sys.peer`, while the remote host will always tally as `reject`. Also, just to make sure, have you seen my comments back to BillThor? – dafydd Oct 26 '12 at 18:34
  • Both servers? If you only have the one remote server in your config, what other server is the query showing? And that refid is indeed interesting - what's that remote server syncing off of? – Shane Madden Oct 26 '12 at 18:36
  • Sorry, the `LOCAL(0)` server shows a `refid` of `.LOCL.`, which makes sense. The remote server still shows the same `refid` of `.LOCL.` and never synchronizes. Either my `ntpd` isn't hitting the remote server at all, which I suspect, or it's in a sync loop. The latter is possible, given the `restrict` behavior I've seen below. – dafydd Oct 26 '12 at 18:57
  • @dafydd Right - what's the remote server set to sync to? The `reach` incrementing indicates connectivity. A sync loop appears to be your problem. – Shane Madden Oct 26 '12 at 19:00
  • Unfortunately, I have no way to correct that. I'll ask the administrators of that time server what their settings are, and see if they can restrict from my subnet, at least. – dafydd Oct 26 '12 at 19:13
  • @dafydd I'm not sure what you mean - what would they need to restrict? They're either syncing from you or they aren't - as Bill mentioned, you can block them from syncing from your end with a `restrict noquery`. Why not test syncing to a real time server out on the internet? – Shane Madden Oct 26 '12 at 19:16
  • I'll reset `restrict noquery` with no other options, and let you know how it goes. – dafydd Oct 26 '12 at 21:04
  • @dafydd You should already have a restrict line for that system, if it's your server. Just add `noquery` to it. And keep in mind, you're cutting of their time sync to you if you do this - I'd make sure they have a working secondary time source. – Shane Madden Oct 26 '12 at 21:05
1

If you are going to include the local clock fudge its level a fair bit. It looks like you have it set to 5. I generally set it to at least 8 (fudge 127.127.1.0 stratum 8). If you don't fudge it, you can appear like an atomic clock to other hosts on your network. On one network I scanned, I found a lot of low strata servers announcing times which were usually incorrect by hours or days.

Shane is correct about the reach value which indicates you have access to the server. The high offset and jitter values for your time server indicates it may not be very reliable. They may be high, because your server is still synchronizing. The fact that the poll interval has increased to 128 indicates that your server is getting consistent results. It should gradually increase to 1024 seconds.

Try running a loop like:

while sleep 60; do
    ntpq -n -c peers; done

This will give you an idea how well ntp is working. You should see it stabilize over time.

There are a number of restrictions which can be set on ntpd to limit how much information about the server can be accessed remotely. It is possible you are restricted to only using the upstream server as a time source.

Firewall rules restricting traffic to port 123 for both source and destination are possible. This provides a working ntp setup but limits access by other tools. Some tools allow you to use port 123 as the source port if it available. I am partial to using ntpdate in debug mode.

If you are correct about the refid of the upstream server being your IP address, it appears to be using your server as it's preferred timesource. Try adding restrict noquery to your configuration. It may be your upstream server is poorly configured. Try adding your router and/or nameservers as sources, I find they can be better sources than the official corporate server.

BillThor
  • 27,354
  • 3
  • 35
  • 69
  • 1
    Make that loop `watch -d ntpq -n -c peers` – Martin Schröder Oct 25 '12 at 07:54
  • @BillThor If I drop the `server 127.127.1.0` line, ntpq shows the correct remote, a continuing refid of `.LOCL.` and no `synchronized` message in syslog. If I also enable `restrict noquery`, I still get no `synchronized` line in syslog and `ntpq -c peers` times out without returning anything. I think I'm seeing .LOCL. because the remote is unreachable. – dafydd Oct 26 '12 at 17:42
  • @BillThor After restoring the `server 127.127.1.0` and `restrict` lines, and adding `fudge 127.127.1.0 stratum 10`, I now get a `synchronized to LOCAL(0)` line in syslog and timeouts in ntpq for both `localhost.localdomain` and the remote server. Trying again with `stratum 8` repeats the timeout. Commenting the `fudge` repeats the timeout. Commenting the `restrict` lines restores original behavior. – dafydd Oct 26 '12 at 18:12
  • 1
    @dafydd It really looks like you are syncing with yourself. Try adding server lines for your nameservers and gateway router. – BillThor Oct 26 '12 at 23:15
  • @BillThor Thanks. That was the suggestion that got me to finding another way to that timeserver. – dafydd Oct 27 '12 at 00:19
1

I had the same problem:

ntpq -p was showing reach = 0

Yet 1- ntpd was running 2- ntp.conf has servers listed 3- ntpdate worked using those servers 4- ntpdate -u worked using those servers 5- nc showed TCP port 123 was open on those server 6- nc showed UDP port 123 was open on those server

So basically ntpdate worked and there were not firewall issues and yet ntpq -p showed reach =0 for each server listed.

Turned out to be the restrict lines in ntp.conf. I just removed all the restrict lines from ntp.conf and restarted ntpd and everything worked from there.

Brad Allison
  • 61
  • 1
  • 1
0

For those of you looking for a grand solution, I apologize. This is going to be cheesy.

Yes, the time server is unreachable, for a reason I couldn't ever determine. The good news is that one of the external DNS servers to which I have access turns out to be serving NTP packets itself, and it is connecting to that external time server for its ticks. It's a workaround, not a fix. But, I'll take what I can get.

So, in the end, I only lose one stratum of service.

As a side note, I did register with the NTP bug database so I could write enhancement bug 2297, asking for formal documentation for the peer refids .INIT., .LOCL., and LOCAL(0).

dafydd
  • 395
  • 2
  • 3
  • 10
0

Just wanted to add my two cents to the top Google search result.. I ran into this problem on a host where NTP was not binding to the host's external interface. If you have this problem, look at the output of netstat -tulpn.

This instance of ntpd will not be able to sync to a known good time source.

$ sudo netstat -tulpn | grep ntp
udp        0      0 127.0.0.1:123               0.0.0.0:*                                  31316/ntpd          
udp        0      0 0.0.0.0:123                 0.0.0.0:*                               31316/ntpd          

This will.

$ sudo netstat -tulpn | grep ntp
udp        0      0 192.168.1.15:123            0.0.0.0:*                               32294/ntpd          
udp        0      0 127.0.0.1:123               0.0.0.0:*                               32294/ntpd          
udp        0      0 0.0.0.0:123                 0.0.0.0:*                               32294/ntpd   

This was a result of the configuration file trying to restrict binding to IPv4 interfaces only using 0.0.0.0 wildcard.

$ grep ^interface /etc/ntp.conf 
interface listen 0.0.0.0

The correct (desired) configuration was as follows.

$ grep ^interface /etc/ntp.conf 
interface listen ipv4
interface ignore ipv6

(Or, to remove both interface configuration options.)

You can also check /etc/sysconfig/ntp or equivalent for any configuration that might restrict interface binding.

Aaron Copley
  • 12,345
  • 5
  • 46
  • 67