1

I have a remote container filled with many PXE booted computers connected to an on site tftp/dhcp server.

If there is a power outage, the server machine takes ages to boot and the PXE booted machines will time out and jump back into bios and never boot. This is bad because I don't have physical access to the machines.

I have a few ideas for solutions to this:

  • Find a way to make booting take longer on the client machines (haven't figured out how to do this)
  • Increase timeout on network boot on client machines (haven't figured out how to do this)
  • Get MAC addresses from DHCP lease file and send them wake on LAN multicast requests after server machine is ready (This seems like the least brittle but unnecessarily complex)
  • Replace server machine with one that boots faster (...)

I know there has to be a simple solution that I haven't considered.

What should I do?

mraaroncruz
  • 191
  • 6

2 Answers2

1

Alternatively,

  1. Get the server machine a UPS.
  2. Use a PDU to boot the PXE clients with a delay.
  3. Set PXE clients to use PXE exclusively and retry indefinitely (if possible).
  4. Optimize the server so it boots faster (SSD, more RAM, ...).

Edit:

  1. If 3. doesn't work put a USB stick into each computer which just a PXE client or a reboot setup on it.
  2. Leave the computers off while the server boots and wake them by WoL when it's up. [...] Saw your script below doing the exact same. ;-)
Zac67
  • 8,639
  • 2
  • 10
  • 28
  • 1. UPS could run out of power (reasonable priced ones only last an hour). 2. I like this but I still don't really trust it. 3. Looked at every setting and couldn't find this, I would love this. 4. Unpredictable. I wouldn't trust this. But more than deserving of an upvote! – mraaroncruz Feb 01 '18 at 15:42
  • 1
    Any UPS eventually runs out on battery. I usually size them for 30 minutes, longer outages are rare and if they do happen they're much longer (hereabouts). I guess that's a mobile setup? A power outage leads to various recovery procedures (resync RAID, check filesystem), so a UPS is definitely a plus. Option 4 has it perks as well. If budget and weight allow I'd go for these options. Opt 3 would be free but if it doesn't work it doesn't work. – Zac67 Feb 01 '18 at 20:17
0

I went with a quick and dirty Golang script, run in the crontab and set the PXE booted machines to not power on when given power.

Here is the script if you have a similar problem https://gist.github.com/mraaroncruz/f103b8af4d81f59a54a5f2af6dc238b6

mraaroncruz
  • 191
  • 6