Every now and then, one of our remote Linux servers crashes: they're unavailable on the network (sometimes responding to a ping, but not to ssh/http) and they won't respond to mouse or keyboard input.
The servers are high-quality consumer grade hardware running Ubuntu 20.04.3 LTS.
Since these crashes happen infrequently, I'm collecting all the common reasons a server might crash like that so I can set up monitoring (munin) to make sure I have all the information needed when it happens and implement countermeasures (eg. periodic restarts?).
Question:
What are reasons for a Linux computer to become unresponsive, what info can I track to diagnose these issues, what can I do to fix them?
I believe this question and answers will be most useful if there's one answer per cause of failure and I'll be posting answers myself as I find such causes.