22

So, it recently dawned on me that since I have 3 GPS clocks in my network, I could, technically, give back a little and serve time to the rest of the world. So far I've not quite seen any downsides with this ideas, but I have the following questions;

  1. Can I virtualize this? I'm not going to spend money and time on standing up hardware for this, so virtualization is a must. Since the servers will have access to three stratum 1 sources, I can't see how this can be a problem provided the ntpd config is correct

  2. What kind of traffic do a public NTP server (part of pool.ntp.org) normally see? And how big VMs do I need for this? ntpd shouldn't be too resource intensive as far as I can gather, but I'd rather know beforehand.

  3. What security aspects are there to this? I'm thinking just installing ntpd on two VMs in the DMZ, allow only ntp in through the FW, and only ntp out from the DMZ to the internal ntp servers. There also seem to be some ntp settings that are recommended according to the NTP pool page, but are they sufficient? https://www.ntppool.org/join/configuration.html

  4. They recommend not having the LOCAL clock driver configured, is this equivalent to removing the LOCAL time source configuration from the config files?

  5. Anything else to consider?

Stuggi
  • 3,366
  • 4
  • 17
  • 34

5 Answers5

22

Firstly, good for you; it's a helpful and public-spirited thing to do. That said, and given your clarification that you're planning on creating one or more DMZ VMs which will sync to and make publicly-available the time from your three Meinberg GPS-enabled stratum-1 (internal) servers:

  1. Edit: Virtualisation comes up for discussion on the pool list from time to time; a recent one was in July 2015, which can be followed starting from this email. Ask Bjørn Hansen, the project lead, did post to the thread, and did not speak out against virtualisation. Clearly a number of pool server operators are virtualising right now, so I don't think anyone will shoot you for it, and as one poster makes clear, if your server(s) are unreliable the pool monitoring system will simply remove them from the pool. KVM seems to be the preferred virtualisation technology; I didn't find anyone specifically using VMWare, so cannot comment on how "honest" a virtualisation that is. Perhaps the best summary on the subject said

    My pool servers are virtualized with KVM on my very own KVM hosts. Monitoring says, the server is pretty accurate and provides stable time for the last 2-3 years. But I wouldn't setup a pool server on a leased virtual server from another provider.

  2. This is the daily average number of distinct clients per second I see on my pool server (which is in the UK, European and global zones) over the past year:

    ntp client count

    This imposes nearly no detectable system load (ntpd seems to use between 1% and 2% of a CPU, most of the time). Note that, at some point during the year, load briefly peaked at nearly a thousand clients per second (Max: 849.27); I do monitor for excessive load, and the alarms didn't all go off, so I can only note that even that level of load didn't cause problems, albeit briefly.

  3. The project-recommended configurations are best-practice, and work for me. I also use iptables to rate-limit clients to two inbound packets in a rolling ten-second window (it's amazing how many rude clients there are out there, who think that they should be free to burst in order to set their own clocks quickly).

  4. Or remove any lines referring to server addresses starting with 127.127.

  5. The best practice guidelines also recommend more than three clocks, so you might want to pick a couple of other public servers, or specific pool servers, in addition to your three stratum-1 servers.

    I'd also note that if you're planning to put both these VMs on the same host hardware, you should probably just run the one, but double the bandwidth declared to the pool (ie, accept twice as many queries as you otherwise would).

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • 1
    Many Linux distributions set `iburst` by default... – Michael Hampton Nov 01 '16 at 21:59
  • 4
    `iburst` I don't mind so much, as it only applies when the server is *un*reachable. Setting `burst`, however, is downright antisocial. – MadHatter Nov 01 '16 at 22:04
  • 1
    Thanks mate, exactly what I wanted to know! To clarify, I'm running VMware under these, and it's a distributed cluster. My internal clocks are Meinberg appliances, and they speak NTP naively. The load seems quite reasonable, my internal clocks see about twice that (but then again they are there so that my devices can be as antisocial as they like). – Stuggi Nov 02 '16 at 05:22
  • @Stuggi I've tried to clarify the virtualisation question by searching the pool operators' list, hopefully that's some help. Do feel free to accept my answer if you think it's dealt with all your questions! And thanks again for running a pool server. – MadHatter Nov 02 '16 at 07:41
  • 1
    @MadHatter Cheers mate, that cleared it up some. I've had to deal with a lot of time issues on VMware before, and do know how to deal with those issues, I was just worried that even after tweaking everything the VM would still be too bad at timekeeping for NTP. VMware is a bare metal hypervisor (aka. the hypervisor is the OS), while KVM (if I remember correctly) runs on top of a "normal" OS, so it should be just fine to run it in VMware. I'll give it a try and see if I get thrown out of the pool! :) – Stuggi Nov 11 '16 at 11:16
12

Firstly, congrats on an NTP question that is non-facepalm material. :-) I've included some graphs at the bottom of this post to give you a feel for things. The VM in question is set to 100 Mbps in the pool control panel, and is in the UK, Europe, and global pools.

  1. I think MadHatter covered this well - virtualisation should be fine. Like you say, if they're feeding from your GPS-connected stratum 1s, they should be reasonably solid. In my experience, VMs tend to be a little more jumpy than bare metal in terms of frequency (see graph below), but that's what you'd expect - they're dealing with a clock emulation layer (hopefully pretty efficient) and potentially noisy neighbours. If you'd rather not see that sort of jumpiness, maybe use older servers or unused desktops as your DMZ stratum 2s instead.

  2. This VM is 1 core, 2 GB RAM, running Ubuntu 16.04 LTS, virtualised in OpenStack (KVM hypervisor). As you can see, the RAM is a little over the top.

  3. The recommended settings - including not having the local driver configured - are the default in Ubuntu 16.04. I'm running very close to the stock configuration, other than the peer list.

  4. (see above)

  5. I'd probably start bandwidth on the low side and ramp up the bandwidth after you've monitored it for a bit. If your VMs are all nearby each other and near your stratum 1s in terms of network latency, I'd probably have all the VMs talking to all the stratum 1s, and probably peer them with each other and turn on orphan mode as well.

Here are the graphs - they all cover the same period of roughly 3 weeks, except for the network one, which had a couple of spikes due to backups. When the network spikes were there I couldn't even see the normal NTP traffic, so I zoomed in a little to show the usual background.

CPU CPU Memory Memory Network Network Frequency Frequency System Offset System Offset

Paul Gear
  • 3,938
  • 15
  • 36
  • Ooooh, nice answer - +1 from me! – MadHatter Nov 04 '16 at 08:05
  • 1
    Thanks mate, more that's really helpful, I'm running less than 3 ms latency between the VMs and the physical NTP appliances, which are distributed geographically within 50 miles of the VM infrastructure, so I think I'll be ok! – Stuggi Nov 14 '16 at 13:31
  • how did you plot the system offset? what utility? – Marc Compere Dec 01 '20 at 16:16
  • @MarcCompere That's using my NTPmon script with LibreNMS. https://github.com/paulgear/ntpmon https://www.librenms.org – Paul Gear Dec 01 '20 at 21:37
3

Some things to consider with NTP

There are already soom good answers here. I am just adding a few thoughts for completeness sake based on my own experiences.

I would suggest enabling NTP logging and graph clock skews and corrections on bare metal vs. VM as it pertains to that discussion if that is a concern. I don't belive this can be generalized easily as hardware and configuration vary between implementations. It might be best to get your own numbers on that one.

I have always suggested to folks to pick systems roles of servers or network devices that have fairly constant CPU time and that are not tickless kernels or that have power saving modes enabled. Especially avoid daemons line cpuspeed or speed govenors or advanced power saving on NTP servers, even if they are only stratum 2 in your farm. Some stability can be gained by never going deeper than C-State 1, but your power consumption will increase.

I also try to ensure that folks pick a handful of stratum 1 servers that are under 40ms away from the edge of their network, then divide them up across your edge NTP servers and ensure that no 2 servers behind the same SNAT in your network are talking to the same stratum 1 server. Along the same lines as burst, it is unwise to have multiple servers behind the same SNAT using the same upstream servers, as it will appear to them you have enabled burst even when you have not.

You should always honor the kod packet from the upstream server and have monitoring tools checking time offsets and reachability of the upstream servers.

You may want to consider having your own accurate time sources in a few of your datacenters to peer with or fall back on in the unlikely case that GPS SA is enabled by the military. There are cost effective appliances specifically for this. Even if you are in a "cage" environment and don't have your own datacenter, some hosting facilities may accomodate this.

Aaron
  • 2,809
  • 2
  • 11
  • 29
  • Stuggi already mentioned that the network in question has 3 GPS clocks. – Paul Gear Dec 06 '16 at 22:59
  • Yes. I am specifically talking about using local cesium clocks that will not drift in the unlikely event that GPS is disabled. That should only occur during a large scale military event, but you never know. – Aaron Dec 07 '16 at 05:46
2

See the vmware timekeeping document at http://www.vmware.com/pdf/vmware_timekeeping.pdf

Running a NTP daemon in a VM is probably not a good idea, particularly if you need reliable time.

  • 3
    While not a precise answer, this raises a valid concern ala "TL;DR: yes, there are issues to deal with regarding virtualization". – rackandboneman Nov 02 '16 at 12:04
  • I'm aware of those issues that need to be dealt with, I'm more thinking if it's at all possible. – Stuggi Nov 02 '16 at 17:22
  • 1
    Re: "Running a NTP daemon in a VM is probably not a good idea, particularly if you need reliable time." - I don't think this is true for any modern hypervisor. The document you linked to specifically says that using NTP in a VM is an option. The graphs I included in my response show that a VM can keep good time on KVM, and I'd expect newer ESXi systems to do the same. – Paul Gear Nov 15 '16 at 02:55
0

Here's a good KB from VMware with actual configuration parameters for different distributions of Linux

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427

Stuggi
  • 3,366
  • 4
  • 17
  • 34