0

I am attempting to architect a service which would run user specific java processes within Docker containers in AWS (ECS most likely). One java process per container per user. The only exception to this might be when spinning up a replacement container for one that is misbehaving. The java process is a packaged piece of software that cannot be modified to fit my needs. I am essentially looking to build a SaaS service around this particular software, and I am aware that there will be licensing details to work out with the developer of the software.

The java process has its own web server built in which uses non-standard ports e.g (30000-30004) for accessing its WebUI. It can support HTTP or HTTPS depending on configuration. I plan on running as many of these containers as possible per EC2 instance to make it more cost effective (this would mean having different ports on the EC2 instances mapping to the internal ports (30000-30004) of the containers.

I would like to wrap the built in UI of the java process with a few other elements to allow for control of the java process (stop, start, etc.) and serve my UI up via another web server or possibly API Gateway via Lambda. This would allow the user to start or stop their instance as necessary and the overall system would use those requests to start or stop the user's specific Docker container.

To give an idea of scale, it is possible that there may be tens of thousands of concurrent users. The projected market size for this project is approximately 100,000 users. The possible market size is around 2 million users per month. Realistically, it will likely be approximately 10,000 total users with approximately 7,500 of those being active within any given month. The potential scale of this likely removes the option of using an Application Load Balancer with individual target and target group setup for each user's container instance since the limits on ALB only allow 1000 targets and only 100 rules.

To give an idea of rate of change, an individual container may need to be running for several hours or possibly a day or more while the user is interacting with it. But it also might only be used for a few minutes on occasion. An individual user's container may only be needed once every week or two, or it may be used daily.

Now to the question, what are some of the best solutions to handle routing traffic to the Docker containers mentioned? I am hoping to host all of this from my own domain and use path based routing to get requests to the proper containers.

For example:

  • container1 is for user1 running on host ec2-app1 on ports 30000-30004
  • container2 is for user2 running on host ec2-app1 on ports 31000-31004
  • container3 is for user3 running on host ec2-app2 on ports 30000-30004

Users should not be able to access any container except for the one that is specifically started for their session. There is some built in authentication within the java process that can be leveraged to help with this.

I am essentially building a multi-tenant service out of a single user piece of software, I know that I will have to build quite a bit of custom tooling and I am not averse to doing so. I would prefer solutions that work well with Python when custom code is needed, but other languages are welcome if necessary. I suspect that I will need to include some sort of agent or startup script within the containers or on the EC2 instances executing them that will register the container with the routing system once the container starts. I would then need to look into a similar shutdown script for removing the registration of the container as well as a regular system for cleaning up in the event that a container or EC2 instance fails without proper shutdown. I have looked into HAProxy and Nginx already, and I am willing to hear ideas regarding these, but other options are also very welcome.

Thank you in advance for your suggestions!

Stimp
  • 1
  • 2

1 Answers1

0

You're definitely right that you'll need to create a whole lot of custom tooling to achieve this, but I think it can be done.

The way I'd go about this is by using an Application Load Balancer (ALB) and Target Groups. An ALB can create custom routing based on path and send traffic to specific target groups (read: servers, or in your case, Docker containers) subject to path rules matching, for instance:

  • Rule 1: Path starts /user1, send traffic to TG1
  • Rule 2: Path starts /user2, send traffic to TG2
  • etc..
  • Rule N: Default rule, all other traffic to TGN (your main site, 404 handler, etc..)

You will need to create tooling to;

  • Provision the container(s) necessary to serve your users. Docker containers can use dynamic ports, and the same EC2/ECS instance can live in multiple TG's on different ports. Once the container is active, figure out what dynamic port(s) it's listening on.
  • Provision a TG to host these containers, and register the appropriate targets on the appropriate ports.
  • Update your ALB to add a new rule for your path, routing to the TG you just created.
  • Redirect the user

You can probably do all of this within the confines of a Lambda function. I don't see any reason why it would ever take more than the max lifetime of 5 minutes to complete these..the longest part is going to be waiting for the containers to become active, and even that's not all that long.

dannosaur
  • 953
  • 5
  • 15
  • Thank you very much for the response. How would you go about getting around the limits of ALB? As I understand it ALB can only route to a target group, not to specific targets within that group based on path, this would mean configuring a maximum of 1 target per target group. Since there is the distinct possibility of needing more than 1000 concurrent Docker instances since each user needs their own and my potential user base is quite large, this would exceed the limits of an ALB's capabilities. – Stimp Apr 25 '18 at 13:08
  • Hey where you able to find a suitable solution architecture for this problem? I have a very similar use case. – Rohin Gopalakrishnan Nov 11 '19 at 19:40