How to maintain 40 copies of the same computer installation?

Well, PXE doesn't have to be a case where you download the image to each server as they boot. In fact, the more traditional use for it (at least when dealing with UNIX systems) was to provide diskless boot for systems which had their root filesystem on NFS (or these days possibly some other network filesystem). I'm not quite sure how well that might work for you (it trades the time issues of PXE for a single point of failure in the NFS server), but it might be worth looking at. You can also do similar things with iSCSI or NBD, though those are a bit more complicated to set up.

You might also look into the possibility of chain-loading things similarly to how SystemRescueCD does. When netbooting, it only needs to load syslinux, the kernel, and the initial ram disk over TFTP, and can then load the actual system image over another protocol (for example, where I work, we do so over HTTP). TFTP is a horribly inefficient protocol (it requires each block to be separately acknowledged before the next block can be sent and uses a very small block size by default), so doing this can significantly speed up the process (we have the network where I work set up to netboot SystemRescueCD and have it load the system image over HTTP instead of TFTP, which cuts the boot time from almost 15 minutes down to about 3 on the systems I tested it on when I set it up).

Given that you're running something based on Ubuntu, you might look at using a combination of MAAS and Juju, as that's the standard stack for doing this type of thing with Ubuntu.

Beyond all that though, if you can safely assume that mass outages like what you saw are rare (and therefore you aren't likely to need to reinstall all 40 systems at once again), you might just look at an automated management tool. It wouldn't help with installing systems, but it would greatly simplify deploying changes to configuration or packages on the systems. I'm particularly fond of Ansible for this type of thing, largely because of how dead simple it is to set up (you literally just need passwordless SSH login and a single specific python package installed on the systems you intend to manage) and the fact that it uses a stateful (mostly) declarative language to handle tasks which is really easy to learn. Puppet, Chef, and Salt are the other three popular options for this type of thing, but I've never had any personal experience with them beyond just cursory evaluation, so i can't really give any advice on which one might be best for your usage.

Austin Hemmelgarn

Posted 2018-03-02T17:41:43.723

Reputation: 4 345

Thanks a lot, that was a very detailed answer! I fell for the HTTP thingie and have to ask: so you basically boot into a SystemRescureCD with TFTP, and from there on, how would/could I continue with HTTP? My TFTP server is Windows, the clients are Ubuntu (I could easily go with diskless options as well). – chr_lt_ney – 2018-03-04T09:03:15.840

@chr_lt_ney The best advice I can give is to check the official documentation for SystemRescueCD on this, located here: http://www.system-rescue-cd.org/manual/PXE_network_booting/. I've never tried setting this up with a Windows TFTP server, so beyond that, there's probably not much advice I can give.

– Austin Hemmelgarn – 2018-03-05T15:33:40.760

How to maintain 40 copies of the same computer installation?

Answers