NTP Stratum Inaccuracy


We're gearing up to have an argument regarding NTP stratum as an indicator of time accuracy. The statement that started the whole thing was:

Stratum 5 can be four minutes off.

My understanding is that NTP tries as hard as it can to put forward the correct time, regardless of how many hops (stratum) you are away from an authoritative clock. I understand that the higher the stratum number means you have more of a chance of a time server gone bad or a flaky network causing incorrect calculations. I understand more than just stratum (jitter, latency, etc.) should be examined to determine how accurate a clock is. I also understand that there should be 3 or 4 (or more?) upstream time servers for redundancy and statistical reliability.

Internally, several production systems are stratum 5. I cannot reach out from my stratum 5 test system to a stratum 2 to get an offset.

ntpdate -q 1.debian.pool.ntp.org
server, stratum 0, offset 0.000000, delay 0.00000
 6 Jan 15:47:46 ntpdate[]: no server suitable for synchronization found

But contacting a few of my internal stratum 3 servers, that difference is about -0.007. (Or even less!)

I'm looking for arguments I can give to non-technical managers to soothe their fears. Right now I'm leaning towards something like this.

Stratum is only a measurement of the number of hops from an authoritative clock. Our internal NTP servers receive time from stratum 2 servers. This is pretty standard across the Internet. (Else the stratum 1 servers would become overloaded. Overloaded time servers report incorrect time.) The difference between our internal stratum 3 servers and the stratum 5 production systems is roughly 7 thousands of a second. Strata 3, 4, and 5 time servers are all owned by us and communicate over our network. Unless our internal stratum 3 time servers (used as the source of time for the entire company) are wildly inaccurate, we shouldn't worry about stratum as an indicator of system time accuracy.

I realize I need to get management to state what an acceptable inaccuracy is. (We are not involved in life-and-death decisions, we do not provide time services to customers, nor do we trade stock where seconds of inaccuracy expose us to large monetary liabilities. I do understand from conversations that 4 minutes does matter to some business departments. Heck, four minutes would probably make NFS go crazy!)

Can anyone point out where my reasoning and process is wrong? Are there better arguments? Are there sites/links describing (in)accuracy of time as the strata number increase that I can use as research?


Posted 2015-01-06T22:25:55.577

Reputation: 142

Question was closed 2015-01-08T09:05:48.827

I have a really good answer on the validity of my arguments and have accepted it. I'd be really interested if others can help me learn and complete this portion of the question:

Are there sites/links describing (in)accuracy of time as the strata number increase that I can use as research? – IAmJeff – 2015-01-08T19:03:26.227

I disagree that this question would solicit responses almost entirely based on opinion. I expected responses in the form of "You are (in)correct based on these criteria." The answer below is based on data and the questioner learned from the answerer. – IAmJeff – 2015-01-08T19:12:21.737



As you stated, stratum only measures the number of hops from a server that claims to be reliable. If you are using reliable servers with good connectivity, you are unlikely to be far off standard time. Your conclusions are correct. The accuracy of your time server hinges on your servers with lowest stratum. I would go with your statement, it sums things up well.

Sum the delay plus offset for all strata to get a worst case variance. This would assume maximally asymmetrical network transfer times. This should be well under a second at stratum 5. Internally, you only need to consider offset from your stratum 3 servers (which should be peered). This appears to be extremely low in your network.

Your level 3 servers should be able to report the data for their level 2 servers. I connect to time servers over an IPv6 tunnel and have delays of 35 to 70 ms. Offsets are under 4 ms. Poll times are 1024 seconds (about 17 minutes).

Within a corporate network, I expect servers using NTP to be synchronized with a few hundredths of a second. It appears your organization has achieved this I have experienced offsets of minutes, but those occurred on servers that weren't synchronizing. There are a number of programs which can monitor NTP servers and report if there are issues.

Flags that there is an issue to investigate include:

  • A high offset (over a few milliseconds.)
  • A low poll time on a server. (This is normal shortly after it is started, but should rise quickly to 1024).
  • A high jitter (although it can be somewhat higher than the offset).
  • A high delay (depends on distance, but normally a few hundredths of a second.
  • Reachability values other than 377 on a server that has been running for over 10 minutes.

I drop servers that show more than one or two of these flags.

Inside a network all these values should be very low, and stratum count shouldn't be a significant factor. As long as the level stays below the stratum assigned to the local clock stratum should not allow significant time differences.

I have surveyed systems with stratum 1 servers that were reporting times days off the correct time. These were likely using the local clock without a fudge factor. (I use 10, but consider any level over 8 as suspect.) Fortunately, you get to pick your time servers.


Posted 2015-01-06T22:25:55.577

Reputation: 9 384

Thanks for the really complete answer! I was confident my reasoning was proper, but always like to verify it. I'm arrogant enough to believe I know more than 85% of the people using and configuring NTP. But I always learning a bit more from those people that know more than I! Thanks again! – IAmJeff – 2015-01-07T16:20:25.340