Optimal number of per CPU unicorn processes

Question

We are running a Ruby on Rails web app under Unicorn. Our app is not strictly CPU bound (we have a dual Xeon E5645 system w/12 cores and a peak load average value is around 6). We started with 40 Unicorn workers initially but application memory footprint increased over time. So, now we have to lower the number of worker processes. I thought that the standard (number of CPU cores + 1) formula applies to Unicorn too but my colleague tried to convince me we should reserve more Unicorn instances per CPU and provided this link. Yet, I am not exactly sure why do we need to spend so much memory on idle Unicorn processes.

My question is: what is the reason to have more than one Unicorn instance per CPU core? Is it due to some architectural peculiarity of Unicorn? I am aware that busy Unicorn processes can't accept new connections (we are using UNIX domain sockets to communicate to Unicorn instances BTW) but I thought backlog was introduced exactly to address this. Is it possible to overcome this 2 to 8 Unicorn instances per CPU rule anyhow?

score 18 · Accepted Answer · answered Jun 13 '12 at 17:10

Okay, I have found the answer finally. The optimal number of Unicorn workers is not directly connected to the number of CPU cores, it depends on your load and internal app structure/responsiveness. Basically we use sampling profiler to determine workers' state, we try to keep workers 70% idle and 30% doing the actual work. So, 70% of the samples should be "waiting on the select() call to get a request from the frontend server". Our research has shown that there are only 3 effective states of workers: 0-30% of samples are idle, 30-50% of samples are idle and 50-70% of samples are idle (yes we can get more idle samples but there is no real point in it because application responsiveness does not change significantly). We consider 0-30% situation a "red zone" and 30-50% situation a "yellow zone".

Can you explain how you are sampling the state of these of workers? — a2f0, Apr 21 '18 at 10:56
@dps I think it could be done using sample profiler. For example https://github.com/benfred/py-spy — Timofey Chernousov, Aug 06 '21 at 10:10

score 7 · Answer 2 · answered Mar 14 '12 at 22:13

7

You're right about N+1 for CPU-bound jobs.

On the other hand, unicorn does not use threads, so every IO op. blocks the process and another process may kick in and parse HTTP headers, concatenate strings and do every CPU-intensive tasks it needs to serve the user (doing it earlier to reduce request latency).

And you may want to have more threads/processes then cores. Imagine following situation: req. A takes ten times more then req. B, you have several concurrent A requests and fast B request is just enqueued waiting for A-req to complete. So if you can predict number of heavy requests, you can use this number as another guideline to tune the system.

answered Mar 14 '12 at 22:13

darkk

333
2
10

1

Good point, let's assume requests are distributed more or less equally and are pretty lightweight (we do have heavy requests in fact but they are handled by another pool of Unicorns). If all requests suddenly become heavy (e.g in case of I/O starvation on a DB node) we will be down regardless of per CPU instances number I guess. Well, probably the best way to know the truth is to perform some kind of load testing. – Alex Mar 14 '12 at 22:39
Yep, testing will tell you. Or, if you've already started, you can grep logs and lookup max number of concurrent requests. I'm pretty sure, that you log both request time and backend response time. Nginx will be your friend if you don't. :) – darkk Mar 15 '12 at 09:26

Optimal number of per CPU unicorn processes

2 Answers2