4

I'm trying to figure out how to deploy my applications on AWS. I have very limited DevOps experience and I'm not sure if my design is good.

I have two application, a web application that handles files upload and a processing application that do processing on the files.

I was planning using AWS beanstalk for web application but for the processing application I'm not sure which strategy to use. I was thinking about using a queue (SQS) to dispatch the processing jobs and put the processing application on a EC2 auto scaling group.

The files are big (a few GBs) and the processing is very I/O bound so will copy the files from s3 to a local SSD on the processing machine before doing the processing.

Other consideration:

  • Both application need to access the same database (Do I need some sort of VPC?)
  • I might have different kind of processing in the future that will require some kind of dispatching to other applications on other instance types (Maybe some machines with more memory).

Question is: is this a good architecture? Any important details I'm missing? Any tips on how to get started on AWS?

cmar
  • 43
  • 4
  • 1
    Your requirements are a bit vague, so the answer might be "definitely maybe." The nice thing about AWS is that you can set this up very quickly and test it -- if it's not right, it's easy to change. – Ron Trunk Apr 02 '19 at 16:52

1 Answers1

1

So Two methods I can think of that should work for you without a DB table of Jobs but still allow DB access if you need it.

Highly scalable serverless

Would be to use an API Gateway and set up a Lambda function to actually do the heavy lifting this means that when your not processing anything your not wasting Money running a system that is not doing anything.

You can read up about it here: https://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-with-lambda-integration.html

This one Allows for Multiple upload processing simultaneously but could become costly if you have a lot of frequent uploads

Your Lambda could then do the processing of the upload as needed.

Cost Effective for constant processing a few at a time

Would be set up an EC2 box that reads from an SQS queue and process one after another and you set up your app that once it's uploaded to S3 it pushed to the SQS queue.

If you start noticing a backlog in your queue with this method you can scale up by adding more EC2 processing instances.

This does mean that if uploads are infrequent you might be running an EC2 doing nothing. Depending on your EC2 instance size you could run your processing application multiple times E.G if it was PHP and you went for a 4 core EC2 you could run the PHP script 4 times in parallel on that box.

Database Question

The First option removes the need for a Database of jobs to process because as they come in via API Gateway. but can still connect to an RDS if you need it to via the NodeJS Lambda libraries. The second option also removes the need for the upload system to be connected because it can use an SQS Queue but if you require it you can connect to an RDS the same as any other application.

If you do wish to use a Database for handling all this instead of an SQS Queue or you have another reason for using a Database you can set up an RDS in the default VPC and use Security Groups for access permissions this is not scalable. (IP Addresses)

You can use the second option and just not use an SQS Queue.

If you need to run an RDS across multiple AZs in the same region you can do this by setting up your RDS across all 3 Availability Zones by setting up Multi-AZ, then any EC2 instance in the same region can access it (this gives you high availability).

If you are wanting to add Multi-Region and no public Access to DB, you will have to create a Set of VPCs in every region you want to operate from and make sure they are all peered to each other and the route tables are set up to allow communication via peering before you create your RDS because it will need creating in the new VPC.

Notes

If you are wanting Multi-Region Multi-AZ for Ultra High Availability I highly recommend before creating your RDS you use T2.nano instances running Amazons custom Linux distro in EC2 to test pinging across the regions via the peering it can get quite complex setting up all the peerings and make sure they are working, get the proccess for 2 regions correct then add other regions one after another making sure all other regions can ping the new region.

Martin Barker
  • 279
  • 1
  • 16