11

Background:

I've inherited a high volume caching nameserver environment (Redhat Enterprise Linux 5.8, IBM System x3550) that has inconsistent ring buffer settings: 1020 for eth0 and 255 for eth1. eth0 is connected to switch 1 of its local datacenter, eth1 is connected to switch 2 of the same. Every server in the cluster alternates between whether eth0 or eth1 is the active interface, and every cluster is located in a different region. The ring buffers obviously need to be made consistent.

Here's where things get trickier: I discovered the problem above when researching why a number of the nameservers are frequently logging "error sending response: unset" errors, which the ISC knowledgebase suggests is related to outbound congestion. Servers with the higher ring buffer setting (1020) drop fewer packets on ifconfig (as one would expect), but tend to log the above error with great frequency, ~20k times a day in one of my highest load groups. We'll call this ''Group 1''. The servers with the lower ring buffer (255) setting drop significantly more inbound packets per day (again, expected), but have far fewer instances of the BIND error, typically 0-150 in that same load group.

Not a huge mystery here either. Caching DNS is a recursive service: if something isn't cached, the server has to make multiple queries on behalf of that one question until it can finally return an answer. It's a (one in)->(many out) query relationship. Fixing the RX ring buffers should cause this number to equalize to a new value across the board, and from there it would probably be a good idea to tune the kernel's outbound network queue in proc (wmem_max/wmem_default).


I like being able to gauge the influence of configuration changes on a performance problem, so I wrote a report to gather some data before I started making production changes. Here's an example of the output for the first two servers in Group 1:

group1-01
    RX: 7166.27/sec av.
    TX: 7432.57/sec av.
    RXDROP: 7.43/sec av.
    unset_err: 27633
group1-02
    RX: 7137.37/sec av.
    TX: 7398.50/sec av.
    RXDROP: 9.94/sec av.
    unset_err: 107

These are the formulas. Note that this is a local script, and there are no reliance on shell scripts that have to be maintained per-server.

    RXPACK=$(ssh $server "sar -n DEV -f /var/log/sa/sa$(date --date=yesterday '+%d') | grep \"Average: .*\$(awk '{if (\$2 == "00000000") { print \$1 }}' /proc/net/route)\" | awk '{print \$3}'" 2>/dev/null)
    TXPACK=$(ssh $server "sar -n DEV -f /var/log/sa/sa$(date --date=yesterday '+%d') | grep \"Average: .*\$(awk '{if (\$2 == "00000000") { print \$1 }}' /proc/net/route)\" | awk '{print \$4}'" 2>/dev/null)
    RXDROP=$(ssh $server "sar -n EDEV -f /var/log/sa/sa$(date --date=yesterday '+%d') | grep \"Average: .*\$(awk '{if (\$2 == "00000000") { print \$1 }}' /proc/net/route)\" | awk '{print \$6}'" 2>/dev/null)
    TXDROP=$(ssh $server "sudo grep 'error sending response: unset' /var/log/dns_named.1" 2>/dev/null | wc -l)

Once I start running this report across all of my caching DNS environments, I notice that another group with a near identical packet load, which we'll call Group 2, has no problems at all:

group2-01
    RX: 7066.44/sec av.
    TX: 7345.95/sec av.
    RXDROP: 0.00/sec av.
    unset_err: 0
group2-02
    RX: 7019.18/sec av.
    TX: 7312.47/sec av.
    RXDROP: 0.00/sec av.
    unset_err: 0

The question:

Why does group2 behave this way without requiring further tuning of RX ring buffers or net.core.wmem_default/net.core.wmem_max? I'm going to need to normalize the ring buffers no matter what, but I would like to understand what else is going on here before I start playing with wmem values in /proc.

The only thing I can think of is that the queue is getting emptied faster by the application, but network stack tuning is not something I have a great deal of hands-on experience with and I'd like to get second opinions. (my eyes glaze over at some of the ethtool counter names, I won't deny it)

I have eliminated the following as possibilities. Proofs follow after the divider.

  • The ring buffer layout is the same. (first server of group1 and group2 configured the same, second server of group1 and group2 configured the same)
  • The default gateway layout is the same.
  • The network cards are the same. (Broadcom BCM5708)
  • The firmware version reported by ethtool is the same. (bc 4.0.3 ipms 1.6.0)
  • sysctl -a output matches between the first servers of both groups and the second servers of both groups. (excluding kernel and fs sections)
  • The total number of servers in Group 1 and Group 2 are the same. (10)

For confidentiality reasons I cannot show the raw named.conf, or the grep filter I'm using to exclude information. You will have to take my word for it that the following configuration parameters are constant between all four servers:

    notify no;
    allow-transfer { none; };
    allow-recursion { any; };
    allow-query { any; };
    allow-query-cache { any; };
    recursive-clients 100000;
    max-cache-size 2G;
    max-ncache-ttl 900;

Below is a great deal of system information. The "hosthash" is just to demonstrate that each iteration of the loop is in fact hitting a different server without revealing the actual hostname.

Host hashes:

group1-1: dc78abcb154b74c87feecb3f35222263d40c028c
group1-2: 9fe491d58fd1e7d4e21e5bf10c164e4cf66e884b
group2-1: fc76bb3ee1ff580c6aba0d685713bb4145bd5fe3
group2-2: b7550c65d37622a131b1e47f066773defbb4d817

for server in $group1_1 $group1_2 $group2_1 $group2_2
do
    echo ____________________
    ssh $server "echo -en hosthash: \$(echo \$HOSTNAME | sha1sum)\\\n\\\n &&
         SARFILE=/var/log/sa/sa\$(date --date=yesterday '+%d') &&
         uname -srvmpio &&
         sudo /usr/sbin/dmidecode -s system-product-name
         dmesg | grep Broadcom &&
         head /proc/cpuinfo &&
         GWIF=\$(awk '{if (\$2 == 00000000) { print \$1 }}' /proc/net/route) &&
         sar -n DEV -f \$SARFILE | egrep '(IFACE|Average)' &&
         sar -n EDEV -f \$SARFILE | egrep '(IFACE|Average)' &&
         sudo /sbin/ethtool \$GWIF &&
         sudo /sbin/ethtool -i \$GWIF &&
         sudo /sbin/ethtool -g \$GWIF &&
         sudo /sbin/ethtool -c \$GWIF &&
         sudo /sbin/ethtool -S \$GWIF &&
         echo sysctl linecount: \$(sudo /sbin/sysctl -a | egrep -v '^(fs|kernel)' | wc -l) &&
         echo sysctl hash: \$(sudo /sbin/sysctl -a | egrep -v '^(fs|kernel)' | sha1sum)"
done

Output:

____________________
hosthash: dc78abcb154b74c87feecb3f35222263d40c028c -

Linux 2.6.18-308.16.1.el5 #1 SMP Tue Sep 18 07:21:07 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
IBM System x3550 -[7978AC1]-
bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem c8000000, IRQ 90, node addr 001a649db00e
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem ce000000, IRQ 177, node addr 001a649db010
cnic: Broadcom NetXtreme II CNIC Driver cnic v2.5.7 (July 20, 2011)
Broadcom NetXtreme II iSCSI Driver bnx2i v2.7.0.3 (Aug 04, 2011)
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping    : 6
cpu MHz     : 2493.750
cache size  : 6144 KB
physical id : 0
siblings    : 4
12:00:01 AM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
Average:           lo   1269.15   1269.15 206600.39 206600.39      0.00      0.00      0.00
Average:         eth0   7166.27   7432.57 704051.80 2419779.42      0.00      0.00      0.94
Average:         eth1      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:00:01 AM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s  txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
Average:           lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth0      0.00      0.00      0.00      7.43      0.00      0.00      0.00      0.00      0.00
Average:         eth1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
driver: bnx2
version: 2.1.11
firmware-version: bc 4.0.3 ipms 1.6.0
bus-info: 0000:04:00.0
Ring parameters for eth0:
Pre-set maximums:
RX:     2040
RX Mini:    0
RX Jumbo:   8160
TX:     255
Current hardware settings:
RX:     1020
RX Mini:    0
RX Jumbo:   0
TX:     255

Coalesce parameters for eth0:
Adaptive RX: off  TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 18
rx-frames: 12
rx-usecs-irq: 18
rx-frames-irq: 2

tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 18
tx-frames-irq: 2

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

NIC statistics:
     rx_bytes: 1505439501410
     rx_error_bytes: 0
     tx_bytes: 4672574845104
     tx_error_bytes: 0
     rx_ucast_packets: 15315548049
     rx_mcast_packets: 2035415
     rx_bcast_packets: 1101989
     tx_ucast_packets: 15505474251
     tx_mcast_packets: 40018
     tx_bcast_packets: 36019
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     rx_64_byte_packets: 92309552
     rx_65_to_127_byte_packets: 1243637891
     rx_128_to_255_byte_packets: 790117566
     rx_256_to_511_byte_packets: 127197337
     rx_512_to_1023_byte_packets: 168929387
     rx_1024_to_1522_byte_packets: 11591832
     rx_1523_to_9022_byte_packets: 0
     tx_64_byte_packets: 60586118
     tx_65_to_127_byte_packets: 1976738758
     tx_128_to_255_byte_packets: 2830395753
     tx_256_to_511_byte_packets: 157607989
     tx_512_to_1023_byte_packets: 1483716940
     tx_1024_to_1522_byte_packets: 406821340
     tx_1523_to_9022_byte_packets: 0
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 116422
     tx_xoff_frames: 134780
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 0
     rx_ftq_discards: 0
     rx_discards: 0
     rx_fw_discards: 14015105
sysctl linecount: 504
sysctl hash: dd6aab90d0fd9ae90742c5f812a78734e2f2ff1c -
____________________
hosthash: 9fe491d58fd1e7d4e21e5bf10c164e4cf66e884b -

Linux 2.6.18-308.16.1.el5 #1 SMP Tue Sep 18 07:21:07 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
IBM System x3550 -[7978EHU]-
bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem c8000000, IRQ 90, node addr 001a6479655c
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem ce000000, IRQ 177, node addr 001a6479655e
cnic: Broadcom NetXtreme II CNIC Driver cnic v2.5.7 (July 20, 2011)
Broadcom NetXtreme II iSCSI Driver bnx2i v2.7.0.3 (Aug 04, 2011)
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping    : 6
cpu MHz     : 2493.746
cache size  : 6144 KB
physical id : 0
siblings    : 4
12:00:01 AM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
Average:           lo   1261.04   1261.04 205548.08 205548.08      0.00      0.00      0.00
Average:         eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth1   7137.37   7398.50 702340.35 2409580.71      0.00      0.00      0.97
Average:         sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:00:01 AM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s  txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
Average:           lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth1      0.00      0.00      0.00      9.94      0.00      0.00      0.00      0.00      0.00
Average:         sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
driver: bnx2
version: 2.1.11
firmware-version: bc 4.0.3 ipms 1.6.0
bus-info: 0000:06:00.0
Ring parameters for eth1:
Pre-set maximums:
RX:     2040
RX Mini:    0
RX Jumbo:   8160
TX:     255
Current hardware settings:
RX:     255
RX Mini:    0
RX Jumbo:   0
TX:     255

Coalesce parameters for eth1:
Adaptive RX: off  TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 18
rx-frames: 12
rx-usecs-irq: 18
rx-frames-irq: 2

tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 18
tx-frames-irq: 2

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

NIC statistics:
     rx_bytes: 1501719289640
     rx_error_bytes: 0
     tx_bytes: 4654179094291
     tx_error_bytes: 0
     rx_ucast_packets: 15253610508
     rx_mcast_packets: 2108112
     rx_bcast_packets: 1136240
     tx_ucast_packets: 15438361249
     tx_mcast_packets: 40135
     tx_bcast_packets: 1721
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     rx_64_byte_packets: 92376678
     rx_65_to_127_byte_packets: 1183040190
     rx_128_to_255_byte_packets: 788176623
     rx_256_to_511_byte_packets: 126838328
     rx_512_to_1023_byte_packets: 168170816
     rx_1024_to_1522_byte_packets: 13350337
     rx_1523_to_9022_byte_packets: 0
     tx_64_byte_packets: 60806588
     tx_65_to_127_byte_packets: 1955234150
     tx_128_to_255_byte_packets: 2806601346
     tx_256_to_511_byte_packets: 154015585
     tx_512_to_1023_byte_packets: 1466206531
     tx_1024_to_1522_byte_packets: 405928513
     tx_1523_to_9022_byte_packets: 0
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 150648
     tx_xoff_frames: 173552
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 1
     rx_ftq_discards: 0
     rx_discards: 0
     rx_fw_discards: 19605427
sysctl linecount: 504
sysctl hash: 4626e3788c72e091487afe1e3a7cfd32278ab07d -
____________________
hosthash: fc76bb3ee1ff580c6aba0d685713bb4145bd5fe3 -

Linux 2.6.18-308.16.1.el5 #1 SMP Tue Sep 18 07:21:07 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
IBM System x3550 -[7978AC1]-
bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem c8000000, IRQ 90, node addr 001a649dc68a
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem ce000000, IRQ 177, node addr 001a649dc68c
cnic: Broadcom NetXtreme II CNIC Driver cnic v2.5.7 (July 20, 2011)
Broadcom NetXtreme II iSCSI Driver bnx2i v2.7.0.3 (Aug 04, 2011)
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping    : 6
cpu MHz     : 2493.750
cache size  : 6144 KB
physical id : 0
siblings    : 4
12:00:01 AM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
Average:           lo   1891.67   1891.67 266593.77 266593.77      0.00      0.00      0.00
Average:         eth0   7066.44   7345.95 730519.41 2215508.99      0.00      0.00      4.37
Average:         eth1      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:00:01 AM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s  txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
Average:           lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
driver: bnx2
version: 2.1.11
firmware-version: bc 4.0.3 ipms 1.6.0
bus-info: 0000:04:00.0
Ring parameters for eth0:
Pre-set maximums:
RX:     2040
RX Mini:    0
RX Jumbo:   8160
TX:     255
Current hardware settings:
RX:     1020
RX Mini:    0
RX Jumbo:   0
TX:     255

Coalesce parameters for eth0:
Adaptive RX: off  TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 18
rx-frames: 12
rx-usecs-irq: 18
rx-frames-irq: 2

tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 18
tx-frames-irq: 2

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

NIC statistics:
     rx_bytes: 4640887074833
     rx_error_bytes: 0
     tx_bytes: 12640942400790
     tx_error_bytes: 0
     rx_ucast_packets: 46405845860
     rx_mcast_packets: 14487857
     rx_bcast_packets: 3476467
     tx_ucast_packets: 47159091638
     tx_mcast_packets: 118147
     tx_bcast_packets: 5504
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     rx_64_byte_packets: 136463411
     rx_65_to_127_byte_packets: 4245502343
     rx_128_to_255_byte_packets: 2357984838
     rx_256_to_511_byte_packets: 355610202
     rx_512_to_1023_byte_packets: 608223572
     rx_1024_to_1522_byte_packets: 65320154
     rx_1523_to_9022_byte_packets: 0
     tx_64_byte_packets: 112166114
     tx_65_to_127_byte_packets: 3010346100
     tx_128_to_255_byte_packets: 4087240164
     tx_256_to_511_byte_packets: 1625596725
     tx_512_to_1023_byte_packets: 3037109096
     tx_1024_to_1522_byte_packets: 927187571
     tx_1523_to_9022_byte_packets: 0
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 79164
     tx_xoff_frames: 89685
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 1
     rx_ftq_discards: 0
     rx_discards: 0
     rx_fw_discards: 6857729
sysctl linecount: 504
sysctl hash: dd6aab90d0fd9ae90742c5f812a78734e2f2ff1c -
____________________
hosthash: b7550c65d37622a131b1e47f066773defbb4d817 -

Linux 2.6.18-308.16.1.el5 #1 SMP Tue Sep 18 07:21:07 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
IBM System x3550 -[7978EHU]-
bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem c8000000, IRQ 90, node addr 00215e3f1ec4
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem ce000000, IRQ 177, node addr 00215e3f1ec6
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping    : 6
cpu MHz     : 2493.753
cache size  : 6144 KB
physical id : 1
siblings    : 4
12:00:01 AM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
Average:           lo   1883.04   1883.04 263726.79 263726.79      0.00      0.00      0.00
Average:         eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth1   7019.18   7312.47 720911.92 2214861.10      0.00      0.00      1.02
Average:         sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:00:01 AM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s  txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
Average:           lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
driver: bnx2
version: 2.1.11
firmware-version: bc 4.0.3 ipms 1.6.0
bus-info: 0000:06:00.0
Ring parameters for eth1:
Pre-set maximums:
RX:     2040
RX Mini:    0
RX Jumbo:   8160
TX:     255
Current hardware settings:
RX:     255
RX Mini:    0
RX Jumbo:   0
TX:     255

Coalesce parameters for eth1:
Adaptive RX: off  TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 18
rx-frames: 12
rx-usecs-irq: 18
rx-frames-irq: 2

tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 18
tx-frames-irq: 2

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

NIC statistics:
     rx_bytes: 4621548539323
     rx_error_bytes: 0
     tx_bytes: 12598031299743
     tx_error_bytes: 0
     rx_ucast_packets: 46260356368
     rx_mcast_packets: 5352446
     rx_bcast_packets: 3474589
     tx_ucast_packets: 47008853953
     tx_mcast_packets: 118164
     tx_bcast_packets: 5471
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     rx_64_byte_packets: 126851062
     rx_65_to_127_byte_packets: 4117708205
     rx_128_to_255_byte_packets: 2346047550
     rx_256_to_511_byte_packets: 356266112
     rx_512_to_1023_byte_packets: 604666332
     rx_1024_to_1522_byte_packets: 62938478
     rx_1523_to_9022_byte_packets: 0
     tx_64_byte_packets: 111216848
     tx_65_to_127_byte_packets: 2984505931
     tx_128_to_255_byte_packets: 4027485330
     tx_256_to_511_byte_packets: 1577669672
     tx_512_to_1023_byte_packets: 3015060448
     tx_1024_to_1522_byte_packets: 933575954
     tx_1523_to_9022_byte_packets: 0
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 129873
     tx_xoff_frames: 145090
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 1
     rx_ftq_discards: 0
     rx_discards: 0
     rx_fw_discards: 6752713
sysctl linecount: 504
sysctl hash: 4626e3788c72e091487afe1e3a7cfd32278ab07d -
Andrew B
  • 31,858
  • 12
  • 90
  • 128
  • 1
    There is a long standing, nasty relationship between the Broadcom NIC and RHEL5. Most of them were fixed going to RHEL5.5. I will see whether any bugs are there on the kernel-2.6.18-308 series. btw, I think this is a NIC issue as the packets are lost in firmware level. See the no of firmware discard. – Soham Chakraborty Jan 08 '13 at 03:37
  • 1
    I just want to say that is most well written question I've seen on SF in months. Excellent job. –  Jan 08 '13 at 07:16

2 Answers2

3

Wondering if the box is a Dell? There's a well known issue with the bnx2i driver and chipsets shipped by Dell. The result is randomly dropped packets under heavy network load. Would seem logical that the tuned-up ring buffers could trigger it, if this is the case.

I believe Dell offers their own version of the driver as a fix. The other fix is to do something like this in modprobe.conf:

options bnx2i disable_msi=1

Can't hurt to try, anyhow. And x2 what kce said. One of the best written questions I've ever seen here.

J Adams
  • 181
  • 9
  • Whoops! You have a good eye, I didn't mention the hardware chassis or the `modprobe.conf` details. These are all IBM System x3550 servers and I've updated the question to reflect that. I was hoping I'd find a difference in modprobe `options` based on your suggestion, but no luck there. None at all, just aliases for eth0 and eth1 to bnx2. I checked `/etc/modules.d/` as well. – Andrew B Mar 01 '13 at 22:48
  • I am more familiar with the bnx2/Dell issue (from the most painful experience) but as someone suggested above, the issue might not be specific to Dell. It could still be worth your time to try turning off msi for the driver. The fact that the dropped packets are on the interface with larger buffers sounds SO much like the bug I described, it's compelling. – J Adams Mar 01 '13 at 22:55
  • A consult with a coworker has this sounding promising, I'll let you know. – Andrew B Mar 01 '13 at 22:59
  • disable msi, unload and reload the module, and retest, modinfo bnx2 should show: parm: disable_msi:Disable Message Signaled Interrupt (MSI) (int) – dmourati Mar 01 '13 at 23:34
  • A quick google suggests that the bnx2 problem was not limited to Dell, and I see that IBM also offers the manufacturer's driver for download. Following dmourati's instructions would be a quick way to eliminate the driver as the source of the problem. – J Adams Mar 05 '13 at 16:38
1

Even if you're sure that you have a full list of load balancer VIPs for your servers, run a packet capture anyway. Just because your machine won't respond to ARP for an IP address doesn't mean that bogus packets can't be sent to it. Make sure the traffic being sent to your MAC addresses are matching up with configured IP addresses.

I appreciate the time that people put into this question, but my own due diligence was lacking here. In hindsight, I needed to build a PCAP filter like this:

tcpdump -i eth0 -n 'ether dst aa:bb:cc:dd:ee:ff and not (dst host 1.2.3.4 or dst host 5.6.7.8 or...)'

Where:

aa:bb:cc:dd:ee:ff = HW addr of eth0
1.2.3.4, 5.6.7.8  = list of destination addresses that traffic is expected on

There were a number of load balancer VIPs that were not given to me (I don't control the LB), and they were passing traffic on TCP port 53 in ways that would result in RX discards. The volume of traffic on these legacy IPs was so low that it was not likely to be noticed by an admin eyeballing traffic on the wire.

Andrew B
  • 31,858
  • 12
  • 90
  • 128