35

I have read multiple times (although I can't find it right now) that data centers take great effort to make sure that all server have the exact same time. Including, but not limited to worrying about leap seconds.

Why is it so important that servers have the same time? And what are the actual tolerances?

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Jens Schauder
  • 475
  • 1
  • 4
  • 10

6 Answers6

53

Security

In general, timestamps are used in various authentication protocols to help prevent replay attacks, where an attacker can reuse an authentication token he was able to steal (e.g. by sniffing the network).

Kerberos authentication does exactly this, for instance. In the version of Kerberos used in Windows, the default tolerance is 5 minutes.

This is also used by various one-time password protocols used for two-factor authentication such as Google Authenticator, RSA SecurID, etc. In these cases the tolerance is usually around 30-60 seconds.

Without the time being in sync between client and server, it would not be possible to complete authentication. (This restriction is removed in the newest versions of MIT Kerberos, by having the requester and KDC determine the offset between their clocks during authentication, but these changes occurred after Windows Server 2012 R2 and it will be a while before you see it in a Windows version. But some implementations of 2FA will probably always need synchronized clocks.)

Administration

Having clocks in sync makes it easier to work with disparate systems. For instance, correlating log entries from multiple servers is much easier if all systems have the same time. In these cases you can usually work with a tolerance of 1 second, which NTP will provide, but ideally you want the times to be as closely synchronized as you can afford. PTP, which provides much tighter tolerances, can be much more expensive to implement.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • 16
    +1 Troubleshooting error conditions on distributed ystems with logging timestamps out-of-sync: *never again* – Mathias R. Jessen Apr 26 '15 at 12:51
  • 3
    Servers running an NTP daemon are typically synchonized to within 0.01 seconds (a few milliseconds). The native Windows NTP synchronization only checks a few times a day, and is not as accurate. There are NTP clients available for Windows, that provide good synchronization. – BillThor Apr 26 '15 at 17:24
  • +1 for this answer, because i just check my system time and i was 5 seconds late, but now i'm using ntp to synchronize, it would be good to add to the question an link to ntp.org so people can check their system – Freedo Apr 26 '15 at 23:29
  • 2
    `make` also gets confused by clock skews between client/server NFS. – Sobrique Apr 27 '15 at 09:17
  • @BillThor your information about windows time service is about a decade behind. It has been a full NTP client since the release of 2003R2, and syncs adaptively agent part of an AD domain. We see <16 ms typical offsets on 2008R2 the limits of the 64Hz system timer tick. We see even better timekeeping on 2012+ boxes which can use smaller tick intervals. – rmalayter Apr 27 '15 at 11:56
  • @rmalayter Glad to hear Microsoft fixed their implementation. I generally work in mixed (often dated) environments where the old implementation is still in place. I still run into servers which should be in the the AD domain that are minutes off the current time. – BillThor Apr 27 '15 at 14:28
  • Data integrity can also be impacted. In our environment, we use ETL's to copy data from an OLTP database to a reporting database. The ETL logic says "Find me whatever has changed since 2pm." If the 2 servers don't agree on what 2pm is, then data can be either missed or duplicated. – Brandon Apr 27 '15 at 14:49
  • Clock sync can also be very important for synchronized files, including DFS and roaming profiles for Windows systems. – Todd Wilcox Apr 27 '15 at 16:05
17

Mainly, it's so that you can correlate incidents from logs on different devices. Suppose you have a security incident where someone accesses your database through your web server -- you want the timestamps on your firewall, your load balancer, your web server and your database server to all match up so that you can find the logs on each device that relate to the incident. Ideally, you'd like everything to be within a few milliseconds. And it needs to be in sync with the actual external time, so that you can also correlate your logs with third-party logs if that should become necessary.

Mike Scott
  • 7,903
  • 29
  • 26
  • 2
    Also, some security solutions such as ldap and/or kerberos will fail if time is not synced. As will some HA solutions. – Jenny D Apr 26 '15 at 05:43
7

Not only is it important from an administration perspective, but having clocks in sync my be important from application level correlation too. This depends on how solution is designed, how the applications running get their timestamp for any transactions they may work with. I have seen transaction validation fail because of an application running on a server with too much offset (was about 20 seconds in the future) compared to the others it was interacting with.

Also if virtualizing on for example VMWare ESXi server, and time of the VM is not in sync with that of the hypervisor, then an action such like vmotion may re-sync the VM clock with the hypervisors and this in turn can lead to unpredictable results if the time difference is big enough.

I do not know what any actual tolerances are, because I think it depends a lot on what type of systems there are, but I think generally speaking it is achievable to keep the servers in the datacenter having less than one seconds offset between one another.

Petter H
  • 3,383
  • 1
  • 14
  • 18
6

Since you mentioned leap seconds it should be noted that they require particularly difficult handling.

They're usually added by injecting a second as 23:59:60, this is problematic if you're validating timestamps as 0-59 for the minutes and seconds fields. The alternative of repeating 23:59:59 to make it 2 seconds long isn't much better since that will mess with anything that is timing sensitive down to a per second level.

Google actually came up with a good solution a while back that seems to have not been widely adopted yet. Their solution was to apply a leap "smear" and split the change over a period of time, the whole process being managed by an NTP server. They published a blog about it back in 2011, makes for interesting reading and seems relevant to this question.

Kaithar
  • 1,025
  • 6
  • 10
  • Sounds a lot like [some NTP clients' *slew* behavior](http://serverfault.com/a/449228/58408), which has been implemented like forever. – user Apr 27 '15 at 11:10
  • @MichaelKjörling kind of. The difference is that slew is intended to avoid large jumps after the clock is determined to be wrong by making a correction over time. What google did was to intentionally add an offset to their ntp server, slewing in advance and thus never having the leap second. – Kaithar Apr 28 '15 at 08:31
5

Whenever timestamps are involved, de-synchronized devices can create logical incoherences, like: A sends a query to B, and the reply of B comes with a timestamp earlier than that of the query, possibly causing A to ignore it.

Harry Cover
  • 151
  • 3
4

I agree with all the point above. I want to throw more thoughts
Some database such as Cassendra rely heavily on time stamp. That is the way it deal with concurrency.

Different timestamp will compeletly mess up the database.

  • 2
    This is better posted as a comment – Dave M Apr 27 '15 at 20:09
  • So I upgraded glibc on a bunch of CentOS 5.2 machines, and the update accidentally set half of the to the MST time zone - we immediately had problems with users randomly getting logged out do to "inactivity." All sorts of little widgets and dodings expect the time to be more or less correct. Also if your time is wrong and gets autocorrected way back, you wind up with files and log timestamps that are really confusing and come from the future... – Some Linux Nerd Apr 27 '15 at 20:21
  • I would expect this to be true of a lot of distributed databases. A difference in timestamps of only a few seconds might be enough to reverse the order of two operations. – Kaithar Apr 30 '15 at 02:56