3

I have a server that runs celery tasks. It runs a couple of worker threads with celery multi start 2, configured using systemd. Sometimes, it gets overworked and hits 100% CPU. When this happens, everything locks up entirely: I can't ssh into the machine, the tasks themselves stop working (I can see from another machine that records are no longer being created in the database for example).

The worker threads are run with Nice=2 in my systemd config.

Why is the CPU issue causing the machine to lock up? (I'm pretty sure it's not a memory issue, as when the machine runs low on memory it just kills the workers). Is there any way to stop this happening? I'd rather not kill tasks, but pause a worker thread until the machine has adequate resources again.

The instance type is t2.medium.

user31415629
  • 301
  • 2
  • 12
  • I'm seeing this exact thing happen, but with a sleepy mysql server. It's suddenly begun happening daily. Nothing interesting in the logs. An AWS reboot won't bring it back. But a Stop and Start will. (?) – Dogweather May 17 '19 at 21:54
  • 1
    We had the exact same issue. In our setup, every EC2 had an Apache Server run in the foreground and Celery in the background. Whenever Celery was using up all the cores, the Apache requests were failing. As a workaround, we limited Celery's `concurrency` to 1 less than the available cores. This ensured the average CPU usage is never 100% and Apache can continue to serve the api requests. Not an efficient solution though since that single core is not being fully utilised. – Shiva Aug 01 '19 at 09:19

0 Answers0