I have the following setup: 1 server with application X, that runs a crontab once an hour. The crontab connects to the database and runs some heavy calculations and exports data to a special data file and restarts the application. The export can run anywhere from 10 minutes to 40 minutes.

I want to:

  • Move that server to AWS and use auto-scaling group.

  • The crontab to run on only one server, do the calculation export the data and somehow sync it to all other live servers.

  • All servers should automatically detect the new data and restart themselves safely (not while syncing for example).

  • New servers that starts from the auto-scaling groups show automatically fetch the data files on startup before starting the actual application.

I don't have a "simple" idea on how to do it or any AWS specific solution.

This is my idea:

  • Run one server outside auto-scaling group. Execute crontab only on that server. All data files will be uploaded to S3.

  • All autoscaling servers will have a crontab that runs every minute and check for an unique file "please_download_me_TIMESTAMP"

  • Once the files are downloaded the script will restart the service.

  • If a new server is started, on startup it will automatically fetch all files from S3.

Would you think this would work?

  • 31
  • 1
  • 4

2 Answers2


This is a common issue in AWS EC2 and has been solved. See https://gist.github.com/kixorz/5209217 for an example with implementation.

Joe Friday
  • 11
  • 1

Acknowledging this is an old question from 2015, bumped by Joe, so I might as well answer it.

If the job is once an hour, and takes 10 - 40 minutes, you're paying for a whole hour anyway. There's no point starting and shutting down servers, just leave a server running.

If it was less often you could have a timed event that puts a message onto an SQS queue - this could be done by a t2.nano or maybe there's a cheaper way to do it with Amazon services - Lambda? Auto scale based on queue size. When there's data to be processed a server is created, processes the data, moves it wherever it needs to go, then shuts itself down.

Another way to do it would be time based scaling, but again only if it was less often than hourly.

  • 30,383
  • 6
  • 47
  • 77