I am attempting to architect a service which would run user specific java processes within Docker containers in AWS (ECS most likely). One java process per container per user. The only exception to this might be when spinning up a replacement container for one that is misbehaving. The java process is a packaged piece of software that cannot be modified to fit my needs. I am essentially looking to build a SaaS service around this particular software, and I am aware that there will be licensing details to work out with the developer of the software.
The java process has its own web server built in which uses non-standard ports e.g (30000-30004) for accessing its WebUI. It can support HTTP or HTTPS depending on configuration. I plan on running as many of these containers as possible per EC2 instance to make it more cost effective (this would mean having different ports on the EC2 instances mapping to the internal ports (30000-30004) of the containers.
I would like to wrap the built in UI of the java process with a few other elements to allow for control of the java process (stop, start, etc.) and serve my UI up via another web server or possibly API Gateway via Lambda. This would allow the user to start or stop their instance as necessary and the overall system would use those requests to start or stop the user's specific Docker container.
To give an idea of scale, it is possible that there may be tens of thousands of concurrent users. The projected market size for this project is approximately 100,000 users. The possible market size is around 2 million users per month. Realistically, it will likely be approximately 10,000 total users with approximately 7,500 of those being active within any given month. The potential scale of this likely removes the option of using an Application Load Balancer with individual target and target group setup for each user's container instance since the limits on ALB only allow 1000 targets and only 100 rules.
To give an idea of rate of change, an individual container may need to be running for several hours or possibly a day or more while the user is interacting with it. But it also might only be used for a few minutes on occasion. An individual user's container may only be needed once every week or two, or it may be used daily.
Now to the question, what are some of the best solutions to handle routing traffic to the Docker containers mentioned? I am hoping to host all of this from my own domain and use path based routing to get requests to the proper containers.
For example:
- container1 is for user1 running on host ec2-app1 on ports 30000-30004
- container2 is for user2 running on host ec2-app1 on ports 31000-31004
- container3 is for user3 running on host ec2-app2 on ports 30000-30004
- The control UI would be served up from https://example.com with an iframe or something similar pointing to the following for the individual containers
- Requests to https://example.com/user1 should go to container1
- Requests to https://example.com/user2 should go to container2
- Requests to https://example.com/user3 should go to container3
Users should not be able to access any container except for the one that is specifically started for their session. There is some built in authentication within the java process that can be leveraged to help with this.
I am essentially building a multi-tenant service out of a single user piece of software, I know that I will have to build quite a bit of custom tooling and I am not averse to doing so. I would prefer solutions that work well with Python when custom code is needed, but other languages are welcome if necessary. I suspect that I will need to include some sort of agent or startup script within the containers or on the EC2 instances executing them that will register the container with the routing system once the container starts. I would then need to look into a similar shutdown script for removing the registration of the container as well as a regular system for cleaning up in the event that a container or EC2 instance fails without proper shutdown. I have looked into HAProxy and Nginx already, and I am willing to hear ideas regarding these, but other options are also very welcome.
Thank you in advance for your suggestions!