5

I'm using Ubuntu 11.10 & nginx. My server's currently doing about 350 rps (that's the load that's coming in). I use iptables to make sure connections on certain ports are restricted only to boxes I own.

I've noticed nf_conntrack_count keeps increasing. No matter what I push nf_conntrack_max to, nf_conntrack_count matches it within a day. Further, it doesn't match what netstat -tn tells me. Here are the numbers:

$ sudo sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_count = 649715
net.netfilter.nf_conntrack_max = 650000


$ netstat -tn | awk '{n[$6]++} END { for(k in n) { print k, n[k]; }}'
CLOSING 6
ESTABLISHED 2933
FIN_WAIT1 116
FIN_WAIT2 3447
LAST_ACK 35
SYN_RECV 79
TIME_WAIT 27141


$ sudo conntrack -L | awk '{n[$4]++}; END {for(k in n) { print k, n[k]; }}'
conntrack v1.0.0 (conntrack-tools): 648611 flow entries have been shown.
CLOSE 443
CLOSE_WAIT 2210
ESTABLISHED 645529
FIN_WAIT 45
LAST_ACK 50
SYN_RECV 74
TIME_WAIT 259

I don't want to keep increasing nf_conntrack_max until I know exactly what's happening. I definitely do not have 650,000 connections to my box (single IP, so I don't have that many ports).

Any idea what's going on or what I can do to explain it? If you need more numbers, I can probably get them.

Note that the majority of my connections are HTTP (the only exceptions being my ssh sessions), and keepalive timeout in nginx is set to 15 seconds. Also net.netfilter.nf_conntrack_tcp_timeout_time_wait = 1

Any help appreciated.

bluesmoon
  • 291
  • 1
  • 3
  • 8
  • /Maybe/ it's possible to avoid connection tracking entirely. It does not seem that this box is no router, so firewalling should be quite simple and - stateless: "Port 80 is allowed, port 22 only from my networks." State-information is always prone to overflow and can mostly easily flooded in an attack. – Michuelnik Jul 17 '12 at 17:26
  • @Michuelnik I'm open to suggestions. The complication is due to the fact that nginx will hand over a connection to a new port after doing an `accept`, so only allowing port 80 will block established connections that communicate over a different port. – bluesmoon Jul 17 '12 at 20:57
  • Sorry - don't get it. "hand over"? Establish a new connection? To the client? In the back-end? – Michuelnik Jul 18 '12 at 12:53
  • @Michuelnik nope, the server does not create a new connection. It listens on port 80, and when a connection comes in on port 80, it assigns a new port number to that connection so that port 80 is free to accept more connections. – bluesmoon Jul 18 '12 at 20:00
  • have you any reference for this? cannot imagine this to work, since changing the port breaks the socket for the client...? and noone forbinds manymany connections to one single port from 2^32*2^16 different source-ports in the ipv4-world... – Michuelnik Jul 19 '12 at 05:44
  • It was based on `man accept`: "The accept() system call is used with connection-based socket types (SOCK_STREAM, SOCK_SEQPACKET). It extracts the first connection request on the queue of pending connections for the listening socket, sockfd, _creates a new connected socket,_ and returns a new file descriptor referring to that socket." The way I understood "new connected socket" was that it assigned a new port, but I could be mistaken since a socket is a tuple of 5 items (2 IPs, 2 ports, 1 proto). – bluesmoon Jul 19 '12 at 12:14
  • Ahhh. I had that feeling what you said before sounded familiar - that's the way I understood it, too, when I read it first. No - ports do not change by accept(). Are you now able to tackle your problem? :D But - does not explain the strange behaviour.... o_O – Michuelnik Jul 19 '12 at 12:32
  • I'm gonna have to wait a few days, still have 588K connections in the conntrack table, and this is a production box, so I'd rather not mess around until there's not much I can mess up. – bluesmoon Jul 19 '12 at 14:07

1 Answers1

6

I may have a clue. The timeout field from conntrack -L has several values that are in the 430,000 second range. This looks suspiciously close to the default value of nf_conntrack_tcp_timeout_established. I've tuned nf_conntrack_tcp_timeout_established down to 300, and all new entries in the table have a timeout value less than 300. This seems to suggest that entries stick around in the connection tracking table for as long as tcp_timeout_established is valid.

Will add to this answer as I get more information.

bluesmoon
  • 291
  • 1
  • 3
  • 8
  • Was just about to ask for the conntrack -L output. But clean closed connections should be removed from the connection tracking table since they are no longer established. Are you messing around with the raw table? Or (without intent) dropping FIN packets or so that prevent a clean teardown? – Michuelnik Jul 17 '12 at 17:21
  • I'm not messing around with the raw table, nor have I done anything to change FIN behaviour. The only timeout I've changed is TIME_WAIT. – bluesmoon Jul 17 '12 at 20:58
  • This is no bad idea at all in this case - but it does not explain the erroneous behaviour... does this help? Maybe you should make a test-connection and dissect it closely with tcpdump. Is there any strange FIN behaviour? Also check with conntrack -L, what's happening to the connection. – Michuelnik Jul 18 '12 at 12:49