My server setup for a heavily used API

Question

I will soon be buying a bunch of servers for an application that I am about to launch but I have concerns about my setup. I appreciate any feedback I get.

I have an application which will make use of an API that I wrote. Other users/developers will also make use of this API. The API server will receive requests and relay them on to worker servers. The API will only hold a mysql db of requests for logging purposes, authentication and for rate limiting.

Each worker server does a different job and in the future to scale, I will add more worker servers to be available to take on jobs. The API config file will be edited to take note of the new worker servers. The worker servers will do some processing and some will save a path to an image to local database to be later retrieved by the API to be viewed on my application, some will return strings of the outcome of a process and save that to a local database.

Does this setup look efficient to you? Is there a better way to restructure this? What issues should I consider? Please see image below, I hope it aids understanding. enter image description here

score 17 · Accepted Answer · edited May 23 '17 at 12:41

Higher Availability

As Chris mentions, your API server is the single point of failure in your layout. What you're setting up is a message queuing infrastructure, something many people have implemented before.

Continue down the same path

You mention receiving requests on the API server and insert the job into a MySQL DB running on each server. If you want to continue on this path, I suggest removing the API server layer, and design the Workers to each accept commands directly from your API Users. You could use something as simple as round-robin DNS to distribute each API User connection directly to one of the available worker nodes (and retry if a connection isn't successful).

Use a Message Queue Server

More robust message queuing infrastructures use software designed for this purpose like ActiveMQ. You could use ActiveMQ's RESTful API to accept POST requests from API Users, and idle workers can GET the next message on the queue. However, this is probably overkill for your needs - it is designed for latency, speed, and millions of messages a second.

Use Zookeeper

As a middle ground, you may want to look at Zookeeper, even though it isn't specifically a message queue server. We use at $work for this exact purpose. We have a set of three servers (analogous to your API server) that run the Zookeeper server software, and have a web frontend for handling requests from users and applications. The web frontend, as well as the Zookeeper backend connection to the workers, have a load balancer to make sure we continue processing the queue, even if a server is down for maintenance. When the work is done, the worker tells the Zookeeper cluster that the job is complete. If a worker dies, that job will be sent to another work to complete.

Other concerns

Make sure jobs complete in the event that a worker isn't responding
How will the API know that a job is complete, and to retrieve it from the worker's database?
Try to reduce the complexity. Do you need an independent MySQL server on each worker node, or could they talk to the MySQL server (or replicated MySQL Cluster) on the API server(s)?
Security. Can anyone submit a job? Is there authentication?
Which worker should get the next job? You don't mention whether the tasks are expected to take 10ms or 1 hour. If they are fast, you should remove layers to keep latency down. If they are slow, you should be very careful to make sure shorter requests don't get stuck behind a few long running ones.

thank you very much for your excellent reply. I knew the API layer was a bottleneck but it seemed the only way I can add more worker servers without having to let the application users know manually. After having read your answer fully, I've come to realise that yes, it would be better if each worker has its own API. Although code will be duplicated as I add more workers, it is more performant for my scenario. — Abs, May 29 '11 at 22:44
@Abs - Thanks for my first up-vote! If you do decide to remove the API layer, I suggest not doing round-robin DNS and setup HAProxy (preferably a pair) as described in this [article](http://andyleonard.com/2011/02/01/haproxy-and-keepalived-example-configuration/). That way, you don't need to deal with timeouts. — Fanatic, May 30 '11 at 01:12
@abs you don't have to *remove* the API layer, but adding redundancy (CARP failover or similar) would be an important consideration to eliminate the single point of failure... — voretaq7, Jun 02 '11 at 04:36
As far as messaging goes I would suggest taking a close look at RabbitMQ before you decide: http://www.rabbitmq.com/ — Antonius Bloch, Jun 04 '11 at 16:06

Chris Ting · Answer 2 · 2011-05-29T22:09:52.573

2

The biggest issue I see is the lack of failover planning.

Your API server is a large single point of failure. If it goes down, then nothing works even if your worker servers are still functional. Additionally, if a worker server goes down, then the service that server provides is no longer available.

I suggest you look at the Linux Virtual Server project (http://www.linuxvirtualserver.org/) to get an idea of how load balancing and failover works, and to get an idea of how these can benefit your design.

There are many ways to structure your system. Which way is better is a subjective call that is best answered by you. I suggest you do some research; weigh the tradeoffs of the different methods. If you need for information about an implantation method, then submit a new question.

edited May 29 '11 at 22:09

answered May 27 '11 at 15:32

Chris Ting

899
4
5

How would you implement a fail-over mechanism in this scenario? A general overview would be great. – Abs May 27 '11 at 20:06
From your diagram, you should research Linux Virtual Server (LVS). Go to http://www.linuxvirtualserver.org/ and start learning all you can. – Chris Ting May 27 '11 at 20:14
Interesting, I will look into that and failovers in general. Any other comments on my setup? Any other dangers that I could face? – Abs May 29 '11 at 14:43
@Abs: There are many issues you could face. Your question has a lot of subjective parts to it, and I don't want to box you in to what I'd personally do. I don't have to support your setup; you do. My real answer is to learn about failover and high availability. – Chris Ting May 29 '11 at 22:15