1

I am managing about 20 servers, many of them virtual. They are almost all different purpose, and none are clustered. I have a distributed LAMP stack, a few application servers, some build servers, a few KVM hosts. They are CentOS 6.3 mostly with a few Ubuntu (unfortunately). I don't have the resources to setup a staging environment where I can have duplicates of my machines and test updates before rolling them out. I am taking file backups. What I want to know is how you are approaching backing up your Linux systems. I assume you don't just do yum update, but then how are you choosing the packages worthy of updating? When (if ever) are you updating the kernel, etc.. How do you test updates without a staging environment? Snapshot and hope for the best?

2 Answers2

4

This is pretty common with servers that are pets, not livestock.

If you really can't test updates, then you:

  1. Have backups in place. Remember that you don't really have backups unless restores work.
  2. Read the description of the updates to see what they change.
  3. Do updates during off-hours. Schedule a maintenance window even if you end up not needing it.
  4. Apply the updates. Reboot if the kernel was updated. Test the affected services.
  5. Wait for the users to start yelling.
  6. If necessary, roll the updates back (using yum history undo).

My guess is that you didn't know you could revert updates with a single command. Check the yum man page and read its history section to see what else you can do with it. For instance, you don't have to revert updates in the order you applied them.

And stop worrying so much. Most updates fix problems that you need to have fixed; introducing new problems is far less common (though it can and does happen).

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
3

There is the yum security plugin (yum install yum-plugin-security), which only picks security-related updates. This is theoretically less of a risk than updates which fix other bugs and/or add features. Then just update your other packages as-required for any bugs you hit or any new features you need to take advantage of.

Really though, there is no way to be certain except by using a staging environment and a good set of tests. No software is free from bugs, all developers can make human error, even Red Hat slip up and put regressions into the EL codebase from time to time.

With no testing environment, it's probably not a case of "if" you hit a problem which affects your business' ability to generate income with these servers, but "when". Not necessarily from updates, just because every little thing you do is done live on prod.

What if you as an admin are asked to implement something you've never done before? How do you learn about it and make sure it works as expected before rolling it out? From what you're saying, you can't.

Make a business case for your boss. Calculate the business impact (ie: the loss of income) of all your systems being unavailable for the time it takes you to completely rebuild the environment from scratch and restore the data from backup.

If that loss of income is cheaper than the cost of setting up a staging environment, then you have a good business case for building such an environment. Proper staging and testing then becomes not an expenditure or investment, but a surprisingly cheap insurance policy.

suprjami
  • 3,476
  • 20
  • 29