I have a server that runs celery tasks. It runs a couple of worker threads with celery multi start 2
, configured using systemd
. Sometimes, it gets overworked and hits 100% CPU. When this happens, everything locks up entirely: I can't ssh into the machine, the tasks themselves stop working (I can see from another machine that records are no longer being created in the database for example).
The worker threads are run with Nice=2
in my systemd config.
Why is the CPU issue causing the machine to lock up? (I'm pretty sure it's not a memory issue, as when the machine runs low on memory it just kills the workers). Is there any way to stop this happening? I'd rather not kill tasks, but pause a worker thread until the machine has adequate resources again.
The instance type is t2.medium.