47

I got into an argument on the net.core.somaxconn parameter: I was told that it will not make any difference if we change the default 128.

I believed this might be enough proof:

"If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value" http://linux.die.net/man/2/listen

but it's not.

Does anyone know a method to testify this with two machines, sitting on a Gbit network? The best would be against MySQL, LVS, apache2 ( 2.2 ), memcached.

Claes Mogren
  • 103
  • 4
petermolnar
  • 989
  • 1
  • 11
  • 17

1 Answers1

69

Setting net.core.somaxconn to higher values is only needed on highloaded servers where new connection rate is so high/bursty that having 128 (50% more in BSD's: 128 backlog + 64 half-open) not-yet-accepted connections is considered normal. Or when you need to delegate definition of "normal" to an applications itself.

Some administrators use high net.core.somaxconn to hide problems with their services, so from user's point of view process it'll look like a latency spike instead of connection interrupted/timeout (controlled by net.ipv4.tcp_abort_on_overflow in Linux).

listen(2) manual says - net.core.somaxconn acts only upper boundary for an application which is free to choose something smaller (usually set in app's config). Though some apps just use listen(fd, -1) which means set backlog to the max value allowed by system.

Real cause is either low processing rate (e.g. a single threaded blocking server) or insufficient number of worker threads/processes (e.g. multi- process/threaded blocking software like apache/tomcat)

PS. Sometimes it's preferable to fail fast and let the load-balancer to do it's job(retry) than to make user wait - for that purpose we set net.core.somaxconn any value, and limit application backlog to e.g. 10 and set net.ipv4.tcp_abort_on_overflow to 1.

PPS. Old versions of Linux kernel have nasty bug of truncating somaxcon value to it's 16 lower bits (i.e. casting value to uint16_t), so raising that value to more than 65535 can even be dangerous. For more information see: http://patchwork.ozlabs.org/patch/255460/

If you want to go into more details about all backlog internals in Linux, feel free to read: How TCP backlog works in Linux.

SaveTheRbtz
  • 5,621
  • 4
  • 29
  • 45
  • 11
    Also worth noting: [since Linux 5.4 it was increased to 4096](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19f92a030ca6d772ab44b22ee6a01378a8cb32d4). – Hi-Angel Nov 23 '19 at 11:01
  • ... and I thought that 16384 was way too much, but my home Synology NAS (ARMv8-based) sets it to... 65535!!! – Gwyneth Llewelyn Nov 16 '20 at 01:18