2

I have an EC2 Instance (micro) running a cakephp API. This instance serves quality validation before deploying into production.

On this instance, we have 5 cronjobs that run once each minute. Those cronjobs, 98% of the time run a Mysql Query and terminate, since there's nothing to do. So basicly, 98% of the time, every minute 5 mysql queries are executed.

Those cronjobs are configured using CloduWatch Events > Rules, which contain the 5 cronjobs set as Documents. Here's an example of one of our documents:

{ "schemaVersion": "1.2", "description": "CronjobNumberOne", "parameters": {

}, "runtimeConfig": { "aws:runShellScript": { "properties": [ { "id": "0.aws:runShellScript", "runCommand": [" . /opt/elasticbeanstalk/support/envvars && /var/app/current/bin/cake cronjob_number_one > /var/log/cronjobs_php 2>&1"] } ] } } }

Everytime the cronjob rule is active, the cpu usage of our Ec2 instance increases, and keeps rising untill the Ec2 instance dies. Here's a graph to see what happens:

CPU increase over the past week, untill today.

I've installed SAR to check the usage of the CPU over the minute, and here's what is happening:

Sar -u ALL 1 120

As soon as I turn off the event of the cornjobs, the cpu lowers to normal values.

I've checked the logs folder, and there is no error or anything like that.

Has this happened to anyone? Any clue on how can I fix this problem? Thanks for your help!

PS: We have another product, which instead of cronjobs by command line, we have 'cronjobs' that make a HTTP request to an enpoint.. We have over 30 'cronjobs' at production and cpu usage is nowhere near this one.

Cafn
  • 131
  • 2
  • This is probably obvious, but one of the things Cron runs is taking CPU and not finishing. What happens when you run the scripts manually? What happens when you run them manually all at the same time? – Tim Oct 30 '18 at 18:32

1 Answers1

1

My guess: Because they are started at the same time perhaps they create some race condition or lock on the database, preventing a successful completion of all or some of them. I would say it's probably only two of them interlocked and unable to finish.

And because a new job is started every minute there are more and more contenders for the resource (presumably MySQL), none of them able to do its job due to some locks. The resource usage on the instance keeps going up and the instance eventually dies.

That's my guess.

What to do: When this happens SSH to the instance and do ps -faxu and / or use top to figure out which cron jobs are still running. You'll be able to tell from the process name.

The next step is to ensure that the offending cron job runs only once at a time.

You've got a couple of options:

  • Simple and probably not very reliable is to spread out the cron jobs throughout the minute. Something like prepending sleep 10 / sleep 20 / ...:

    sleep 10; . /opt/elasticbeanstalk/support/envvars && /var/app/current/bin/cake cronjob_number_one > /var/log/cronjobs_php 2>&1
    
  • Better, yet a little more complex would be to use semaphores, for example with the help of flock(1). Essentially this is how it works:

    1. you start the cron job
    2. it calls flock to try to create a lock file
    3. if it succeeds -> run the actual job
    4. if not (because the old one still exists, because the job hasn't finished yet) -> exit

Hope that helps :)

MLu
  • 23,798
  • 5
  • 54
  • 81
  • Hi @MLu. Thanks for your answer. I'm testing out the flock implementation (seems pretty easy to implement on php), and i'll see what this leads! Will reply again when I have any result :) – Cafn Nov 05 '18 at 12:04
  • Hi @MLu. I've tested this out, and sadly it's not working. Here's a link of the picture of the server : https://i.gyazo.com/cdbc276a40bd211f8fdaf05ebe72b115.png . It starts on day 25, because it was when I've restarted the instance. I've implemented the flock method using php code. Any other clue? – Cafn Nov 26 '18 at 15:10