7

Last week there were a fair few comments on a slashdot article about whether Unix (or Linux) machines ever need to be rebooted. More than a few of the commenters mentioned having machines with uptimes of several years.

As I understand it, linux boxes need to be rebooted fairly often to apply kernel patches, especially security related ones (such as the ac1db1tch3z exploit). Running uname -r after a 'yum update kernel' seems to show that the old kernel isn't loaded until a reboot.

My question is, how are these boxes achieving multiple year uptimes given this? A few possible solutions I've thought of

  1. The machines aren't production and/or exposed to users so the security patches aren't as much of a concern.
  2. All of the posters are using live patching services such as Ksplice
  3. The kernel security patches can be applied by reloading modules rather than the entire kernel.
  4. uname -r is reflecting incorrect information after a kernel patch, and the updated kernel is loaded after all.

Are any of these explanations reasonable, or is there something I'm missing in my understanding? Is there another way to minimize the two dozen or so reboots necessary from the last two years?

Beerey
  • 252
  • 1
  • 4
  • 10
  • The question should never be how, but 'why'. Reboots should not be scary in linux environments. How does one know that a server with a 5 year uptime will survive a restart? (caused by power failure etc). Regular reboots should generally be viewed as a good thing – Daniel Widrick Oct 20 '16 at 17:17

7 Answers7

9

One solution is to use ksplice.

If you use Ubuntu or CentOS kernels you can subscribe to the ksplice.com service, where for a small fee they will provide you with special kernel images that can be used to patch a running kernel. Reboots are not required for most updates. Pretty easy to use and setup.

If you are particularly skilled you can use the ksplice patches to build your own enabled kernels without subscribing to the service, or for non-standard kernels.

mfarver
  • 2,576
  • 13
  • 16
  • Free for ubuntu and fedora, quite nice and useful ... – zillion Mar 04 '11 at 08:50
  • Ksplice seems like a terrific service, I've been following their blog and site for a while. Unfortunately they don't support as many distro/kernels as we would need - for example, the RHEL5.4 EUS kernel is currently unsupported by Ksplice. – Beerey Mar 09 '11 at 01:10
  • Yes we are comparing the value of adding grsec to our kernels vs using the default CentOS kernels and having access to ksplice. – mfarver Mar 10 '11 at 03:27
8

I have had servers with 1+ year of uptime. Not the best practice because from a security perspective the server...some of these servers were database masters and we couldn't afford downtime.

I think security should be the prime concern but then there are some real world limitations. If you have the luxury patch it and reboot it if needed. Don't worry about uptime, better safe than sorry.

I would suggest always rebooting a server after a major upgrade to insure it comes back up, you don't want to be caught in a situation after an unexpected reboot.

Sameer
  • 4,070
  • 2
  • 16
  • 11
3

Our shop has a pretty good policy about patching/rebooting. The importance of staying secure out-weighs the uptime statistic. We have a regular patching routine that works to ensure we are not getting caught in a Bad Things Happen situation.

Our move to cluster computing has helped to ensure the important things stay up and the work to get setup was definitely worth it.

If uptime matters for for maintaing service to clients then you should be looking at load balancing and clustering. You can maintain a secure and redundant environment as well as service uptime.

If you are sacrificing security for braging rights, you are likely doing disservice to your clients.

Mike
  • 792
  • 3
  • 5
2

I think the only time one need to reboot Linux machine is to replace the kernel. I have several machines running for more then 2 years but I maintain them based on "If it ain't broke, don't fix it" principle and that is how I achieve the uptime. Of course, if your servers exposed to external threats you will need to apply security fixes periodically, and some of them will require new kernel. I'm not aware of any way to do it reliably without rebooting the machine. There may be some tricks here but there is a good chance that you will compromise stability in the process and you will need to take machine into a single user mode. You will technically achieve the uptime but the machine will not be available to the end users during this time, so what's the point?

If the uptime is really critical for you, you may be interested in some form of HA/clustering solution when you can reboot one node of a cluster without affecting availability of the entire system. Otherwise just reboot.

dtoubelis
  • 4,579
  • 1
  • 28
  • 31
2

Minimizing downtime is more important than minimizing reboots. Like Sameer said, not keeping up with your kernel patches is A Bad Thing™. I have the luxury of having load balancers (mainly because a lot of the stuff my employer does is in the cloud), so we do rolling updates—which lets me update AppServer-1, pull it out of the load balancer, reboot, make sure everything is OK, tell the LB, "OK dude AS-1 is back up!", then continue with the rest of the machines.

Tom Norris
  • 21
  • 2
1

The less stuff you got installed the less likely you are to need something patched. Minimizing your install (or as I like to think about it: the attack surface), can go a long way. This is not only for packages, but also for kernel configurations. These days most distros come compiled with every module possible, which is far from optimal. Custom kernels can be a pain to maintain, but they can also pay off, as you know exactly what's in there, and further reducing likelihood of needing to be patched.

Marcin
  • 2,281
  • 1
  • 16
  • 14
1

(Disclosure: I work for Canonical)

For Ubuntu specifically, Canonical now delivers live kernel patching on 16.04.

This uses the live patching technology in the upstream Linux kernel since 4.0 was released.

dpb
  • 445
  • 5
  • 16