3

I'm working on an image hosting website tailored to a particular niche. The website is made with Django. I'm currently planning to run it on Linode.

So far so good. The problem is: I will need to perform very CPU-intensive tasks on high resolution images. We're talking about scientific grade computation that can take up to 15 minutes on Linode's 4 Xeon CPUs.

I'm not sure if EC2 works like this, but is the following scenario something that rings a bell?

  1. User uploads an image on the website, which is hosted on Linode
  2. The application (somehow?) requests that EC2 runs the CPU intensive task.
  3. EC2 boots a new instance and runs the software with the data provided
  4. The data is somehow returned to the web application

Obviously I have lots of gaps in the way this thing would work. Can somebody please help me fill them?

EDIT: I forgot to mention that I use celery for the tasks, using RabbitMQ as a message dispatcher. I wonder if it's possible to run create celery tasks on my web server, but then actually run them on EC2 instances created on demand. Ideally, this would also take care of the communication protocols between the parties involved (as I would be pickling webserver's side).

2 Answers2

3

Yes, EC2 seems like a good fit for what you're trying to do. As far as how to do it exactly, I'm not familiar with celery and RabbitMQ, but I assume it's just a matter of writing some code that processes the jobs in celery as required -- this might involve retrieving the data from your webserver to do the job (out of the database using a web services API) and send the results back (again, via a web services API you define).

womble
  • 95,029
  • 29
  • 173
  • 228
  • Thanks. I was reading up and I have some follow-up questions: 1) does EC2 make sense for me also if I only typically need one instance? 2) How long does an instance take to boot? 3) If I run a 10-minute task every 20 minutes, will I be charged also for the time the instance is not really doing anything? Or do I need to turn it off and then back on on demand? Thanks! – Salvatore Iovene Jul 16 '11 at 12:24
  • Questions go in questions, not in comments. – womble Jul 16 '11 at 12:30
  • Sorry, I'm new to StackExchange. I'll create a new question then, thanks! – Salvatore Iovene Jul 16 '11 at 12:45
0

If you are doing image processing you may want to look in to seeing if the back end processing you are doing can be turned in to a MapReduce problem. It is cheaper per hour than a full EC2 instance ($0.015/hour vs $0.085/hour) as it does not give you a full VM. It runs the Hadoop Framework.

There are many tutorials online that explains how to use Hadoop, here is one from Yahoo (the worlds largest Hadoop user according to Wikipedia) that goes over the basics of how Hadoop works too.

Of course this is all contingent that you can port your processing code from Celery to Hadoop.

Michael Lowman
  • 3,584
  • 19
  • 36
Scott Chamberlain
  • 1,445
  • 2
  • 21
  • 37
  • 1
    You should look more into Hadoop; this isn't accurate. Elastic MapReduce uses EC2, so the $0.015/hr are in addition to, not in place of, the EC2 charges. Secondly Hadoop is only efficient or ideal for processing very large datasets that benefit from massive parallelization. If you had a large set of images it could work (see the washington post case study) but this still isn't optimal. OTOH, Celery is pretty much specifically tailored for this kind of task. Best to stick with it. – Michael Lowman Jul 20 '11 at 19:26