"Can we upgrade our existing production EL5 servers to EL6?"
A simple-sounding request from two customers with completely different environments prompted my usual best-practices answer of "yes, but it will require a coordinated rebuild of all of your systems"...
Both clients feel that a complete rebuild of their systems is an unacceptable option for downtime and resource reasons... When asked why it was necessary to fully reinstall the systems, I didn't have a good answer beyond, "that's the way it is..."
I'm not trying to elicit responses about configuration management ("Puppetize everything" doesn't always apply) or how the clients should have planned better. This is a real-world example of environments that have grown and thrived in a production capacity, but don't see a clean path to move to the next version of their OS.
Environment A:
Non-profit organization with 40 x Red Hat Enterprise Linux 5.4 and 5.5 web, database servers and mail servers, running a Java web application stack, software load balancers and Postgres databases. All systems are virtualized on two VMWare vSphere clusters in different locations, each with HA, DRS, etc.
Environment B:
High-frequency financial trading firm with 200 x CentOS 5.x systems in multiple co-location facilities running production trading operations, supporting in-house development and back-office functions. The trading servers are running on bare-metal commodity server hardware. They have numerous sysctl.conf
, rtctl
, interrupt binding and driver tweaks in place to lower messaging latency. Some have custom and/or realtime kernels. The developer workstations are also running a similar version(s) of CentOS.
In both cases, the environments are running well as-is. The desire to upgrade comes from a need for a newer application or feature available in EL6.
- For the non-profit firm, it's tied to Apache, the kernel and some things that will make the developers happy.
- In the trading firm, it's about some enhancements in the kernel, networking stack and GLIBC, which will make the developers happy.
Both are things that can't be easily packaged or updated without drastically altering the operating system.
As a systems engineer, I appreciate that Red Hat recommends full rebuilds when moving between major version releases. A clean start forces you to refactor and pay attention to configs along the way.
Being sensitive to business needs of clients, I wonder why this needs to be such an onerous task. The RPM packaging system is more than capable of handling in-place upgrades, but it's the little details that get you: /boot
requiring more space, new default filesystems, RPM possibly breaking mid-upgrade, deprecated and defunct packages...
What's the answer here? Other distributions (.deb-based, Arch and Gentoo) seem to have this ability or a better path. Let's say we find the downtime to accomplish this task the right way:
- What should these clients do to avoid the same problem when EL7 is released and stabilizes?
- Or is this a case where people need to resign themselves to full rebuilds every few years?
- This seems to have gotten worse as Enterprise Linux has evolved... Or am I just imagining that?
- Has this dissuaded anyone from using Red Hat and derivative operating systems?
I suppose there's the configuration management angle, but most Puppet installations I see do not translate well into environments with highly-customized application servers (Environment B could have a single server whose ifconfig
output looks like this). I'd be interesting in hearing suggestions on how configuration management can be used to help organizations get across the RHEL major version bump, though.