2

I have a small farm of web servers (HP Proliant and IBM x, with Broadcom Corporation NetXtreme II BCM5 NIC's) running Apache 2.2.15 on CentOS 6, behind a Cisco ACE load balancer, serving a PHP/JS based web portal. This farm receives a lot of requests daily (it serves a whole small country) trying to access a splash page (to go, from there, to the index page)

I've been struggling with the following problem:

  • I've noticed sometimes requests to web delay quite a "long" time to be answered (from the client point of view) and sometimes they are not even answered at all (timeout at web client side). In the latter, I don't even seen the request on Apache logs.

  • I've also noticed that netstat reports an increasing amount of TCP resets being sent (netstat -st | grep 'resets sent')

  • Also, dropwatch -l kas shows there are many packets being dropped:

Initalizing kallsyms db dropwatch> start Enabling monitoring... Kernel monitoring activated. Issue Ctrl-C to stop monitoring 53 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 26 drops at tcp_rcv_established+926 (0xffffffff814981b6) 3 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 1 drops at netlink_unicast+251 (0xffffffff81471b11) 56 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 29 drops at tcp_rcv_established+926 (0xffffffff814981b6) 4 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 51 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 32 drops at tcp_rcv_established+926 (0xffffffff814981b6) 2 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 1 drops at ip_rcv_finish+199 (0xffffffff8147ea49) 1 drops at tcp_v4_destroy_sock+115 (0xffffffff814a0cf5) 1 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 22 drops at tcp_rcv_established+926 (0xffffffff814981b6) 36 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 2 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 49 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 29 drops at tcp_rcv_established+926 (0xffffffff814981b6) 26 drops at tcp_rcv_established+926 (0xffffffff814981b6)

I've been following recommendations from RH (Red Hat Enterprise Linux Network Performance Tuning Guide), even though I've not seen some of the symptoms described there in my servers. In short:

  • I've increased the NIC ring buffers to maximum.
  • I've fiddled with (increased or changed) several kernel parameters (tcp_syncookies, netdev_budget, tcp_timestamps, tcp_window_scaling, tcp_rmem, dev_weight, tcp_tw_reuse...)
  • I've modified the Apache config according to several "Apache optimization guides" extracted from web (even tough there were, and still are, Idle workers on Apache stats)
  • I've stop/disabled any system service/daemon not required (basically all that remains is sshd, httpd and snmpd)

All of the above with no luck.

All NIC's at working at Speed: 1000Mb/s, CPU and disk usage are low, and neither netstat nor ethtool shows errors.

Any ideas what else can be done?

  • Are there any errors in the apache logs related to 'maxclients'? – Dti Oct 22 '16 at 04:56
  • Looks like apache is simply not performing enough. Check the tcp accept statistics, and queue overflows. Also check if apache children hit the cap mentioned above. And consider switching to nginx. – drookie Oct 22 '16 at 05:04
  • Since `netstat -ni` doesn't show any errors, this has nothing to do with ring buffers. – drookie Oct 22 '16 at 05:05
  • Finally, it's worth checking what is exactly resetting, I guess it's the tcp/80 and tcp/443. – drookie Oct 22 '16 at 05:06
  • @Dti There are no errors regarding maxclients now (I've increased them in Apache conf previously) – Dõùĝ Díäz Oct 22 '16 at 21:39
  • @drookie These are my metrics right now (server 1/6): 45 packets pruned from receive queue because of socket buffer overrun 617 times the listen queue of a socket overflowed Very low, I think... I started considering Nginx after some reading... – Dõùĝ Díäz Oct 22 '16 at 21:42
  • @drookie `dropwatch -l kas` shows packets dropped very continuously. That's way I increased the NIC ring buffers to maximum (Although it made no difference in overall service performance) – Dõùĝ Díäz Oct 22 '16 at 21:45
  • @drookie Yes. Resets are being sent from TCP port 80 (443 is closed/not used). I just wonder why Apache is resetting connection if there are Idle workers according to its own statistic... – Dõùĝ Díäz Oct 22 '16 at 21:48
  • Well... may be apache isn't just fast enough while handling the socket queue, and it's got overflown, thus RSTs. Try to increase the `ListenBacklog` twice, and make sure the `tcp_max_syn_backlog` is at least twice as big, and see what happens. – drookie Oct 23 '16 at 21:37

1 Answers1

3

A TCP reset is an immediate close of a TCP connection. This allows for the resources that were allocated for the previous connection to be released and made available to the system.

causes of RST generation

Ack, Reset

  1. sent in response to a Syn. An Ack Reset sent in response to a Syn frame is sent to acknowledge the receipt of the frame but then to let the client know that the server cannot allow the connection on that port. Among the reasons for the Ack, Reset are:

    a. The node being connected to is not listening on the port the client node is trying to connect to.

    b. There is some reason that the server node cannot complete the connection on that port. For example, the server is out of resources and so cannot allocate the needed resources to allow the connection.

RST

  1. If the connection is in any non-synchronized state (LISTEN, SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges something not yet sent (the segment carries an unacceptable ACK) , a reset is sent.

  2. The next reset is a TCP reset that happens when a network frame is sent six times (this would be the original frame plus five retransmits of the frame) without a response. As a result, the sending node resets the connection.

As you and tried using various kernal tuning parameters , Try using tcp cookies option of kernel

Enable TCP SYN cookie protection

Edit the file /etc/sysctl.conf, run:
# vi /etc/sysctl.conf

Append the following entry:

net.ipv4.tcp_syncookies = 1

Save and close the file. To reload the change, type:
# sysctl -p 

solution can be given only by analyzing your logs , IPtables can also help

Arjun sharma
  • 605
  • 4
  • 9
  • tcp_syncookies is already enabled in all servers (I think it's enabled by default on CentOS); but that makes no difference in performance. I still see packets dropped and tcp resets... – Dõùĝ Díäz Oct 22 '16 at 21:52