1

A bit of background

I am running two servers under high traffic one with ubuntu 12.04 (linux 3.2.0-69-generic) and one with ubuntu 14.04 (linux 3.13.0-52-generic). I am now trying to secure both. They both have very similar hardware resources (same number of CPUS, but the 12.04 one has only 8 GB of RAM when the 14.04 got 16 GB).

I wanted to enable ufw firewall, but I ran into some problems with nf_conntrack table getting full. Packets were being dropped basically.

I found a solution to them by lowering the timeouts and increasing the table size as well as the number of buckets. That is:

net.netfilter.nf_conntrack_tcp_timeout_established = 600
net.netfilter.nf_conntrack_max = 196608
net.netfilter.nf_conntrack_buckets = 24576

These values are properly updated and survive reboot. (See this blog) I also see the conntrack_count being raised well above the default value, so I'm sure this is working on both servers. The values stay well under the limits so I'm sure its fine.

The issue

The 12.04 server works fine under high load, but the 14.04 keeps dropping packet, creating client timeouts. Now at bootup on 14.04, I can see this line in kern.log:

TCP established hash table entries: 131072 (order: 8, 1048576 bytes)

While on 12.04, it's:

TCP established hash table entries: 524288 (order: 11, 8388608 bytes)

I suspect this may be why my server is dropping packet, as this table may be too small regarding the traffic amount on 14.04.

So I tried looking for a way to set this size, and found the parameter thash_entries see here for explanation). However, I cannot set it with sysctl.

So here are my questions:

  1. Is this tcp connection table really the source of my trouble? or should I look somewhere else?
  2. If it is, then how can I set it and make it survive a reboot?

Thanks in advance for any help, and do not hesitate asking me if you need more help.

P.S. I am more of a developper than a system expert, so I would appreciate any detailed answer :)

1 Answers1

1

Tweaking the Linux kernel for high network throughput is an art based on balance.

Increasing the Connection Tracker Table is fine, but it means more sockets are potentially used, this in turn means that the system needs more File Descriptors, and the wheel goes on...

In your case, I would start with the following kernel settings:

net.core.somaxconn

and

fs.file-max

The first determine the amount of open sockets that the kernel will sustain. The second one is used to set the amount of used File Descriptors that will be supported by the kernel.

Then there's the SYN backlog that can be further tweaked.

net.ipv4.tcp_max_syn_backlog

Will set the amount of connection that can be in waiting of an ACK from your server.

net.ipv4.tcp_syncookies

For the SYN Backlog to work you need to enable TCP SYN cookies.

Finally, there's also some tweaks that can be made such as enabling TIME_WAIT connection reuse.

net.ipv4.tcp_tw_reuse

This can potentially reduce the amount of "new" sockets that would be opened when you receive a spike.

That's just the tip of the iceberg, my experience with high volume Linux/Unix system is that you will tweak it for a couple of months before getting the right balance.

Make sure you look at errors in /var/log/kern.log and /var/log/messages to help further troubleshoot.

Tuning Kernel

High Throughput Computing Administration Guide

Alex
  • 3,079
  • 20
  • 28