What are the scalability concerns with pub/sub servers?

Question

I'm looking into setting up a pub/sub service with websockets. From what I can tell the scalability bottlenecks will mainly be with memory, which affects how many sockets can be opened at a time, so therefore I would think it wise to split this off of the other servers running services such as APIs. Is this correct? I would imagine memory is more expensive than compute power when it comes to hosting, so are there any best practices when it comes to optimizing this type of server for scalability and cost?

The goal is to provide the user of this web application with real-time updates as systems in the field check-in with new data, without having to poll the backend periodically. But we don't want to double our server costs or it might not be worth it. We are using AWS EC2 with load balancing and auto-scaling for our current API servers.

score 1 · Answer 1 · answered Sep 17 '21 at 15:54

The actual memory usage of a single socket isn't that much.

What does eat up memory is the state associated with which client is interested in which updates, and which client has already received a particular update.

In a primitive implementation (i.e. using the OS network stack), the latter state is kept in the form of outgoing buffers -- so if an update is sent to 10,000 clients, the data is copied 10,000 times, each of the copies appended to an outgoing queue, where it is augmented with the requisite headers (that contain per-connection state), and then a descriptor is built for the hardware that instructs it to send a packet that is a concatenation of the headers and the payload.

The per-client copy of the payload is kept in memory until it is acknowledged by the client, and that is where the memory requirements come from. This memory cannot be paged out, so it creates memory and cache pressure on other applications.

There are implementations that implement parts of the network stack inside the server program itself, and these can avoid the copies by reference counting or recreating payloads on-demand, that allows you to get away with a lot less memory usage, but involves a lot of tricky coding to be truly scalable, especially multi-socket servers pose some interesting issues there that the OS network stack already knows how to work around.

The options you have

run the pub/sub service on the same server as the app
run the pub/sub service on a dedicated server with OS networking
run the pub/sub service on a dedicated server with custom networking
run the pub/sub service on multiple dedicated servers

are your escalation strategy as the service grows. Moving from shared to dedicated does not require much planning, and can be done as needed; once that has happened, it is time to prepare the further stages.

Scaling up to multiple servers is going to introduce nondeterminism into your system, as clients may receive updates in different order, so for this scaling step to be successful, your clients need to be aware of this and be able to present a consistent view -- whether that is trivial or difficult depends on your actual application.

tl;dr: no need to optimize prematurely. Split out the service so the first scaling step is a simple configuration change, and start optimizing as soon as that has happened.

What are the scalability concerns with pub/sub servers?

1 Answers1