I have many Linux servers (SUSE 9 &10) used to run web services that provide data to large calculation grids. Recently we have had some difficult to explain outages (i.e. hardware and software logs are not showing any obvious errors) and we are starting to wonder whether the long uptime (typically 200-300 days) is the issue. Given that these servers are heavily utilised, should I consider a regular reboot cycle?
11 Answers
You must reboot after a kernel update (unless you are using KSplice), anything else is optional. Personally I reboot on a monthly cycle during a maintenance window to make sure the server and all services come back as expected. This way I can be reasonably certain if I have to do an out of schedule reboot (i.e. critical kernel update) that the system will come back up properly. Automated monitoring of servers and services (i.e. Nagios) also goes a long way to helping this process (reboot, watch the lights go red and then hopefully all back to green).
P.S. if you do reboot regularily you'll want to make sure you tune your fsck checks (i.e. maximal mount count between checks appropriately, otherwise a quick 2 minute reboot might take 30 minutes if the server starts fsck'ing a couple terabytes of data. I typically set my mount count to 0 (tune2fs -c 0) and the interval between checks to 6 months or so and then manually force an fsck every once in a while and reset the count.
- 1,293
- 9
- 9
-
1Regularly testing your DRBCP is a must, and this type of check is a *great* start in that direction. – Scott Pack May 30 '09 at 19:55
-
You don't need to reboot after kernel update - http://www.ksplice.com/ – raspi Sep 04 '10 at 12:38
-
1KSplice is correct, with KSplice you can live patch running software (Kernel, Database, etc.). However Since Oracle bought KSplice that's probably not a solution for anyone not using Oracle stuff (who recently bought KSplice). – Kurt Aug 17 '11 at 23:07
I actually reboot my servers on a fairly regular basis, any time major configuration changes are made. It's important to know that in the event of an emergency the server software will come up without a hassle. The last thing you want is to be in a position where you are trying to recover from an outage but are having to mess with your server configuration because you didn't thoroughly test it when you set it up.
- 11,946
- 7
- 46
- 68
Linux servers never need to be rebooted unless you absolutely need to change the running kernel version. Most problems can be solved by changing a configuration file and restarting a service with an init script.
You need to watch out for reboots... if you changed anything "on the fly" without reflecting your changes in a service's configuration file, those changes will not be applied after a reboot.
I usually reboot after scheduled system updates, though. It's generally not necessary, but I do them when nobody's in the office, so why not? There are often kernel upgrades when I get to doing the update, anyway.
- 544
- 2
- 7
-
Of course they need to reboot from time to time. When you update software and that particular software is currently running you'll still be using the old version of the software because the copy of the old version is still active in the RAM. You'll need to restart that piece of software (by service restart or reboot) for the update to take affect. Some applications need a reboot and can't be updated thru service restart – BlueWizard Oct 30 '15 at 09:29
-
2@JonasDralle, services should automatically stop and restart when they get upgraded. Otherwise, it is a bug in the implementation of that service! – Alexis Wilke Feb 24 '16 at 23:53
Not really required, linux memory handling is excellent. But if you're having uptimes of that length you're probably running kernels that have known vulnerabilities - you might want to watch that.
- 296
- 1
- 2
-
3Linux may handle its memory ok, but individual applications may not - their heaps could become fragmented if they run for a long time. Of course things like prefork Apache (which recycles its processes) don't generally suffer from this. Other things which use a single very-long-lived process (e.g. mysql) may. Depends on your application. – MarkR May 31 '09 at 11:12
I think you should reboot if there has been a recent kernel update OR a libc update. A lot of things are linked with libc and it's not really possible to unload that lib from memory completely and replace it with the new version unless you do a reboot.
For example, even basic things like /bin/ls and other things in /bin use libc. If you are just running a console and using bash, you are using libc.
$ ldd /bin/bash
linux-gate.so.1 => (0xffffe000)
libtermcap.so.2 => /lib/libtermcap.so.2 (0xb8029000)
libdl.so.2 => /lib/libdl.so.2 (0xb8025000)
libc.so.6 => /lib/libc.so.6 (0xb7ed9000)
/lib/ld-linux.so.2 (0xb804b000)
$ ldd /bin/ls
linux-gate.so.1 => (0xffffe000)
librt.so.1 => /lib/librt.so.1 (0xb7f3a000)
libacl.so.1 => /lib/libacl.so.1 (0xb7f33000)
libc.so.6 => /lib/libc.so.6 (0xb7de7000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb7dd0000)
/lib/ld-linux.so.2 (0xb7f61000)
libattr.so.1 => /lib/libattr.so.1 (0xb7dcc000)
And yes, if you change files in /etc/init.d which affect startup in some way, I would recommend a reboot. You don't want to find out that you made a small mistake in a startup file when you need things up and running again quickly.
If a server has gone many days without a reboot it actually means that there is no way to be sure that it will come up again properly. Once again this is because a lot of config files might have been changed on it, and no one has rebooted it for a long time to make sure it comes up. Also, if the server has a lot of updates due and you haven't rebooted for a long time, reboot before you apply the updates, otherwise if there is a problem, you can't be sure it was caused by a configuration error a long time ago or the new updates you applied.
Lastly, if you reboot a critical server after a very long time, the fsck might mean you have to wait a very long time now for it to come back up. You can use tune2fs to avoid this, but it's a good idea to check it regularly I suppose. This is why you shouldn't be in a position where you are dependent on just 1 server and if that goes, your whole website is gone. You should have another one on standby.
- 61
- 6
Another thing to look for while having this unexpected downtime, is to look at how exactly the memory and processor are being used and by what programs.
top
should be able to determine which processes are the culprit for the loss of resources, and then be able to manage them directly. Another idea would be to initialize a cronjob to shutdown and restart your processes on a specific schedule.
- 604
- 4
- 13
Its not a bad idea to reboot if it has been that long so you can run a disk check ( fsck ) on the root partition. Your argument can be that this helps ensure data integrity.
- 82,107
- 71
- 302
- 444
A properly run Linux server should only need rebooting for kernel updates. The same can't always be said for some of the software - for instance, I sometimes have to restart apache2 or mailman.
- 5,217
- 1
- 27
- 39
My infrastructure has two data sites, the alpha (where operations takes place on a daily basis) and the beta (the backup site, in case things go horribly wrong at alpha). Although this is not currently the case, I am pushing to have scheduled downtime at the alpha site every 6 months, so that we can run all services from beta.
This will accomplish two things. First, it will prove that our disaster recovery site is completely viable. Second, it will give me a week's worth of time to remove accumulated cruft at alpha.
As it is, I don't reboot my servers as frequently as I should. I agree with the other posters who said that it's important to know that your servers will come back up when you need them to. You don't want to "think" that they will, only to find out that you've changed something and not done it correctly, or not documented it.
- 20,218
- 10
- 67
- 114
You can additionally wrote some scripts which will check (as much as is possible), if the current state of your machine, is going to be the state of the machine post-reboot.
What I mean by this is...
/etc/init.d/*
- Check that all services currently running, are flagged to start on boot
- Check that all services not running are flagged not to start on boot
/etc/fstab
- Check that all mounted filesystems (i.e.
/etc/mtab
) have a corresponding entry in/etc/fstab
- Check that all filesystems specified to be mounted on boot in
/etc/fstab
are also currently mounted.
- Check that all mounted filesystems (i.e.
This is of course not a complete check by any means, but it does reduce the risk of troubles post-reboot.
Additional to this, you should (imo) set a policy for server package updates, in some sensible order, say 1 group per week...
- Crash & Burn Servers
- Development Servers, Training Servers
- Test Servers
- Pre-Production Servers
- Production Servers
Also have an overall plan, such as "All servers will go through a complete OS upgrade once every 6 months".
- 4,133
- 3
- 26
- 33
Depends on the tasks running on the server. For some virtual servers we often use reboot instead of i.e. apachectl restart and it just takes 5-10 seconds longer. But some heavy loaded machines are rebooted several times per year with a whole admin crew monitoring the process.
- 1,850
- 2
- 17
- 35