72

"Can we upgrade our existing production EL5 servers to EL6?"

A simple-sounding request from two customers with completely different environments prompted my usual best-practices answer of "yes, but it will require a coordinated rebuild of all of your systems"...

Both clients feel that a complete rebuild of their systems is an unacceptable option for downtime and resource reasons... When asked why it was necessary to fully reinstall the systems, I didn't have a good answer beyond, "that's the way it is..."

I'm not trying to elicit responses about configuration management ("Puppetize everything" doesn't always apply) or how the clients should have planned better. This is a real-world example of environments that have grown and thrived in a production capacity, but don't see a clean path to move to the next version of their OS.

Environment A:
Non-profit organization with 40 x Red Hat Enterprise Linux 5.4 and 5.5 web, database servers and mail servers, running a Java web application stack, software load balancers and Postgres databases. All systems are virtualized on two VMWare vSphere clusters in different locations, each with HA, DRS, etc.

Environment B:
High-frequency financial trading firm with 200 x CentOS 5.x systems in multiple co-location facilities running production trading operations, supporting in-house development and back-office functions. The trading servers are running on bare-metal commodity server hardware. They have numerous sysctl.conf, rtctl, interrupt binding and driver tweaks in place to lower messaging latency. Some have custom and/or realtime kernels. The developer workstations are also running a similar version(s) of CentOS.


In both cases, the environments are running well as-is. The desire to upgrade comes from a need for a newer application or feature available in EL6.

  • For the non-profit firm, it's tied to Apache, the kernel and some things that will make the developers happy.
  • In the trading firm, it's about some enhancements in the kernel, networking stack and GLIBC, which will make the developers happy.

Both are things that can't be easily packaged or updated without drastically altering the operating system.

As a systems engineer, I appreciate that Red Hat recommends full rebuilds when moving between major version releases. A clean start forces you to refactor and pay attention to configs along the way.

Being sensitive to business needs of clients, I wonder why this needs to be such an onerous task. The RPM packaging system is more than capable of handling in-place upgrades, but it's the little details that get you: /boot requiring more space, new default filesystems, RPM possibly breaking mid-upgrade, deprecated and defunct packages...

What's the answer here? Other distributions (.deb-based, Arch and Gentoo) seem to have this ability or a better path. Let's say we find the downtime to accomplish this task the right way:

  • What should these clients do to avoid the same problem when EL7 is released and stabilizes?
  • Or is this a case where people need to resign themselves to full rebuilds every few years?
  • This seems to have gotten worse as Enterprise Linux has evolved... Or am I just imagining that?
  • Has this dissuaded anyone from using Red Hat and derivative operating systems?

I suppose there's the configuration management angle, but most Puppet installations I see do not translate well into environments with highly-customized application servers (Environment B could have a single server whose ifconfig output looks like this). I'd be interesting in hearing suggestions on how configuration management can be used to help organizations get across the RHEL major version bump, though.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 18
    I was about to mark this for closure as "not constructive", when I saw the author's name, and rep., and out of respect I won't be doing so. I still think it's a silly question, because the answer is that "Red Hat decided it should be so". 4->5 upgrades were perfectly possible via DVD boot, and there were procedures for doing it to a live OS using `yum`, which worked for me most of the time. My only hope is that RH have taken a **huge** hit of the pain stick from their paying customers for their decision to have no supported upgrade path 5->6, and will rethink this for 6->7. – MadHatter Nov 15 '12 at 15:26
  • 1
    That said, you do know there's a working unsupported upgrade path via DVD boot from C5->C6 using the `upgradeany` boot-time parameter, yes? I've tested it twice, once on a clean C5 install where it worked fine; once on a (test copy of a) crufty old "used to be C4 and was upgraded" install where it failed dramatically. – MadHatter Nov 15 '12 at 15:28
  • 2
    I'm well aware of the upgradeany options, and have definitely forced the installations using the live RPM approach (changing repo, `*-release files` and all). But the questions from customers this week made me think more about how entrenched an environment can become with a specific version, and have no path out. – ewwhite Nov 15 '12 at 15:32

2 Answers2

42

(Author's Note: This answer refers to RHEL 6 and prior versions. RHEL 7 now has a fully supported upgrade path from RHEL 6, the details of which are at the end.)


To start, I should note that there are two ways to do the in-place upgrade:

  1. Drop in the installation DVD (or use the DVD image via iLO/iDRAC), boot from it and choose Upgrade, e.g. linux upgradeany.
  2. Update the redhat-release RPM manually, run yum distro-sync (this is oversimplified a bit) and reboot.

Method 1 is merely unsupported. Method 2 is for Real Cowboys. In addition to the recommended fresh installs, I have done both of these...


Do I need support?

Support has two complementary meanings in our world. The first is that a product has a given feature (e.g. "Postfix supports SMTP"). The second is that the vendor will talk to you about it. Which definition is meant is not always clear from context.

To accomplish a task, you obviously need support in the first sense. Where vendor support comes in is to assist you in resolving issues and giving the vendor feedback as to what features need to exist or be improved. Many sites pay a fortune for vendor support when they have the in-house expertise to resolve any issues that may arise, faster and even cheaper than the vendor could. Whether to buy vendor support is ultimately a business decision you will have to make (or advise management on).


Why not do an in-place upgrade?

This is what Red Hat says about it:

Red Hat does not support in-place upgrades between any major versions of Red Hat Enterprise Linux. A major version is denoted by a whole number version change. For example, Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6 are both major versions of Red Hat Enterprise Linux.

In-place upgrades across major releases do not preserve all system settings, services or custom configurations. Consequently, Red Hat strongly recommends fresh installations when upgrading from one major version to another.

They further warn:

However, note the following limitations before you choose to upgrade your system:

  • Individual package configuration files may or may not work after performing an upgrade due to changes in various configuration file formats or layouts.
  • If you have one of Red Hat's layered products (such as the Cluster Suite) installed, it may need to be manually upgraded after the Red Hat Enterprise Linux upgrade has been completed.
  • Third party or ISV applications may not work correctly following the upgrade.

Of course, they then describe how to do an in-place upgrade via method 1, just in case you really want to do it. The feature exists and Red Hat puts development time into it, so it is supported in that the feature exists. But if something goes wrong, Red Hat will tell you to install fresh; they will not provide vendor support for things that break as a result of the upgrade.

For the record, I've never actually had a problem with an in-place upgrade of a RHEL/CentOS or Fedora system that I couldn't resolve myself. The typical problems come from renamed packages, third party repositories and the occasional version mismatch between the i386 and x86_64 architectures of a package. The installer is a bit better at handling these than yum, I think.


How should I upgrade?

I generally warn people that they should plan on a maintenance window every 3-4 years to update RHEL systems from one major version to the next. While upgrades generally go smoothly, the unexpected can always happen.

For both of your environments, I expect an in-place upgrade would work, though I strongly recommend testing it thoroughly first. P2V a representative sample of the servers and run through the in-place upgrade on the virtual systems to see what problems you're going to run into. You can then plan the actual production upgrade based on better knowledge of what will happen.

For a large deployment such as you have here, consider using Limoncelli's "one-some-many" approach. Upgrade one machine, see what problems occur, solve them, then use lessons learned when upgrading a small batch of machines, repeat the lessons learned thing, then when you believe you have all the kinks worked out, upgrade large batches of them.

At a time like this, I also recommend taking a long hard look at your application deployment process. If it isn't sufficiently automated that you can kick it off with a single command and be reasonably sure that the app will be deployed correctly, then perhaps the developers need to get to work on that. Having such a deployment process would make it much easier to do a fresh installation of the newer version of EL and then deploy onto it.


Will switching distributions help?

Debian-based distributions do have a supported in-place upgrade method, and it mostly works, but it is not immune from problems. Lots of things broke for people upgrading from Ubuntu 10.04 LTS to 12.04 LTS via the supported method, for instance. It's not clear that Debian or Canonical are putting a sufficient amount of development time into "supporting" this feature, i.e., making sure it works. And you still actually have to buy vendor support for this distribution if you want someone to hold your hand. So I doubt you will gain much from switching to such a distribution.

You may gain by switching to a rolling-release distribution such as Gentoo or Arch. However, this also doesn't make you immune to problems; it just means you have to deal with the upgrade problems continuously over the life of the server (e.g. whenever you or the developers decide to update something on the system), rather than all at once at a well-planned distribution upgrade time. You also have no vendor to provide support.


What does the future hold?

The Fedora Project is working on a tool to improve in-place upgrades. They had a tool called preupgrade which was abandoned and replaced with a new tool called fedup beginning with Fedora 18. This was added to RHEL7 and now in-place upgrades have full support, at least from RHEL 6 to RHEL 7. From my own experience I can say that while fedup still has some kinks, it is shaping up to be a very useful tool.

CentOS is also experimenting with a rolling-release type of repository, but it only applies between minor versions (e.g. 6.3-6.4).

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • 1
    The new Fedora upgrade tool is called [fedup](https://fedoraproject.org/wiki/QA%3AFedora_18_Upgrade_Testing). Three to four years sounds aggressive to me for major upgrades too, must installs I see last much more towards the 10+ year lifecycle of RHEL, so I'd encourage more regular minor upgrades. – Dominic Cleal Nov 15 '12 at 18:38
  • 3
    For people who need new features on an ongoing basis, 3-4 years is almost too long. – Michael Hampton Nov 15 '12 at 18:39
  • 3
    Simple things like PHP, Apache, kernel revisions and GLIBC... People tend to want those changes more frequently. – ewwhite Nov 15 '12 at 19:13
  • @ewwhite I don't see it as often with glibc; yours is definitely a special case in that regard. But with public facing web sites, there is definitely a high demand for more frequent updates of the various components comprising the web stack (web server, app server, database). That the OS has a 10 year lifecycle does not necessarily mean you should run it for 10 years! – Michael Hampton Nov 20 '12 at 04:55
  • 2
    Debian/Ubuntu's upgrade process is not perfect, but the fact that it's the preferred upgrade mechanism and Red Hat have no officially supported upgrade mechanism speaks volumes to me. – Paul Gear Nov 20 '12 at 20:52
  • OpenSUSE also has in-place upgrades. We'll see about SLES when SLES 12 is released. – Martin Schröder Nov 21 '12 at 18:58
  • 1
    It's not as much whether in-place upgrades exist, as they obviously do, but whether the respective vendors provide support for them. – Michael Hampton Nov 21 '12 at 19:27
  • This is a great question and answer. It would be great to get another update. What has changed since early 2017? – MountainX Dec 07 '19 at 03:06
  • 1
    @MountainX There's a [RHEL 7 to 8 upgrade path](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/upgrading_to_rhel_8/) but I haven't personally upgraded anything yet as EPEL is still way behind on packaging stuff I need. – Michael Hampton Dec 07 '19 at 05:19
  • The more reading I do, the more I think Ubuntu is the better choice for in-place major version upgrade. But I can't help wondering if Arch Linux isn't also a viable choice. – MountainX Dec 07 '19 at 06:23
  • 1
    @MountainX Ha, I've seen Ubuntu in place upgrades blow up spectacularly, too. These days, with everything in a VM, going containerized, etc., I lean toward fresh installs. – Michael Hampton Dec 07 '19 at 19:03
  • If I were ready to go fully containerized for the apps, I would almost certainly run Arch Linux as the server OS (because I'm partial to Arch). That seems like the best of everything to me, at least in theory... of course I have yet to actually try it. Have you heard of anyone doing this? – MountainX Dec 08 '19 at 00:04
  • @MountainX You might be interested in reading [What is the problem with using Fedora for servers?](https://serverfault.com/q/40408/126632) as many of the same issues apply to Arch and other rolling release distros, only more so. – Michael Hampton Dec 08 '19 at 00:48
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/101936/discussion-between-mountainx-and-michael-hampton). – MountainX Dec 08 '19 at 01:05
7

My take on your last paragraph:

I suppose there's the configuration management angle, but most Puppet installations I see do not translate well into environments with highly-customized application servers (Environment B could have a single server whose ifconfig output looks like this). I'd be interesting in hearing suggestions on how configuration management can be used to help organizations get across the RHEL major version bump, though.

I think the real value of configuration management systems, especially in the context of Environment B, is that they provide the tools to construct a service independently of the servers which run it. If a CMS wasn't used to create the existing services, then it probably won't help very much in recreating the services.

I know this doesn't solve your immediate problem, but to me it stems from the organisation thinking in terms of servers rather than services. In service-focused thinking, the personality of individual servers need not be maintained as long as the service continues to function. If a CMS is used in a disciplined manner to build the entire service, then moving that service to another system should be relatively straightforward, because all of the machine's personality will be built by the CMS.

P.S. I'm not exactly sure what's significant about the ifconfig output in this context - it's produced by a configuration file and some scripts (otherwise it wouldn't be there on boot), and those can be managed by a CMS, if needed.

Paul Gear
  • 3,938
  • 15
  • 36
  • 1
    You're right about services versus servers in the general sense. Environment B has some specialized server hardware (10GbE NICs, offload libraries) that interfaces with upstream providers. It's something that can't be load-balanced or moved easily without downtime. A non-finance example would be something like a server attached as a controller for some involved production machinery. Special case, maybe with dedicated PCIe interface cards. Very much a one-off setup unique to the server. In Puppet, would you just say, "Here's the config for this one host/role" and live with it? – ewwhite Nov 20 '12 at 22:51
  • 2
    Agreed, some things aren't easy to fit into general cases, especially if you have an environment with specific hardware requirements. With puppet, pushing as much into the role as possible makes good sense. But in the end it has to work, so if something not quite elegant makes it work, then i just live with it being inelegant. Much of the time, we have to live with things being inelegant simply because we don't have the time to make them "right". – Paul Gear Nov 21 '12 at 23:52