I would like to find out the best possible configuration/hardware to deliver 40Gbps from a single server in this question.
Situation
We have a video share proxy server that offloads peaks from slow storage servers behind it. All traffic is HTTP only. The server acts as a reverse proxy (files that are not cached on the server) and a webserver (files that are stored on local drives).
There are currently something like 100TB of files and growing on the backend storage servers.
The caching mechanism is implemented independently and this question is not about caching itself as it works very well - currently delivers 14Gbps, passes to the backend servers only 2Gbps. So the cache usage is good.
Goal
Achieve 40Gbps or even more throughput from a single machine.
Hardware 1
HW: Supermicro SC825, X11SSL-F, Xeon E3-1230v5 (4C/8T@3.4GHz), 16GB DDR4 RAM, 2x Supermicro 10G STGN-i1S (LACP L3+4)
SSD: 1x 512GB Samsung, 2x 500GB Samsung, 2x480GB Intel 535, 1x 240GB Intel S3500
System:
- irqbalancer stopped
- set_irq_affinity for each interface (via script in ixgbe driver tarball)
- ixgbe-4.3.15
- I/O scheduler deadline
- iptables empty (unloaded modules)
- Filesystem: XFS
Nginx:
- sendfile off
- aio threads
- directio 1M
- tcp_nopush on
- tcp_nodelay on
As seen on the graphs, we were able to push 12.5Gbps. Unfortunately the server was unresponsive.
There are 2 things that took my attention. The first one is high amount of IRQ. In this case I don't unfortunately have graphs from /proc/interrupts. The second thing was high system load, which I think was caused by kswapd0 having problems to work with 16G of RAM only.
Hardware 2
HW: Supermicro SC119TQ, X10DRW-i, 2x Xeon E5-2609v4 (8C/8T@1.70GHz), 128GB DDR4 RAM, 2x Supermicro 10G STGN-i1S
SSD, System configuration are the same as hardware 1. Nginx is sendfile on (aio/sendfile compared further).
This seems better, so now as we a have server, which works in peaks, we can try some optimizations.
Sendfile vs aio threads
I tried to disable sendfile and use aio threads instead.
- sendfile off
- aio threads
- directio 1M (which matches all files we have)
vs
- sendfile on
Then at 15:00 I switched back to sendfile and reloaded nginx (so it took a while to finish existing connections). It is nice that the drive utilization (measured by iostat) went down. Nothing has changed on the traffic (unfortunately zabbix decided not to collect the data from bond0).
sendfile on/off
Just tried to switch send on/off. Nothing has changed except Rescheduling interrupts.
irqbalancer as a server/cron/disabled
As @lsd mentioned I tried to setup irqbalancer to be executed via cron:
*/5 * * * * root /usr/sbin/irqbalance --oneshot --debug 3 > /dev/null
Unfortunately it didn't help in my case. One of the network cards started behaving strange:
I couldn't find what was wrong in graphs and as it happened the next day again, I logged in to the server and saw that one core was at 100% (system usage).
I tried to start irqbalance as a service, the result was still the same.
Then I decided to use the set_irq_affinity script and it fixed the problem immediately and server pushed 17Gbps again.
Hardware 3
We did upgrade to new hardware: 2U 24 (+2) drives chassis (6xSFF), 2x Xeon E5-2620v4, 64GB DDR4 RAM (4x16GB modules), 13x SSD, 2x Supermicro (with Intel chip) network cards. New CPUs improved the performance a lot.
Current setup remains - sendfile, etc. Only difference is that we let only a single CPU handle both network cards (via set_irq_affinity script).
20Gbps limit has been reached.
Next goal? 30Gbps.
Feel free to shoot at me ideas how to improve the performance. I will be happy to test it live and share some heavy graphs in here.
Any ideas how to deal with large amount of SoftIRQs on the cpu?
This is not a question about capacity planning - I already have the hardware and the traffic. I can always split the traffic to several servers (which I will have to do in the future anyway) and fix the problem with money. This is however a question about system optimization and performance tunning in a real live scenario.