5

My goal is to configure our CentOS ("free" RHEL) 5.x servers for custom low-latency network programs. I would like to experiment with binding ethernet NIC interrupt handling to the same CPU on which the program runs (to hopefully improve cache utilization). The first step in this process is to determine the NIC's IRQ.

Here is the contents of /proc/interrupts on one server (note that I deleted CPUs 2 through 14 for brevity):

           CPU0       CPU1       CPU15
  0:  600299726          0          0    IO-APIC-edge  timer
  1:          3          0          0    IO-APIC-edge  i8042
  8:          1          0          0    IO-APIC-edge  rtc
  9:          0          0          0   IO-APIC-level  acpi
 12:          4          0          0    IO-APIC-edge  i8042
 50:          0          0          0   IO-APIC-level  uhci_hcd:usb6, uhci_hcd:usb8
 58:       6644      25103          0   IO-APIC-level  ioc0
 66:          0          0          0   IO-APIC-level  ata_piix
 74:        221     533830          0   IO-APIC-level  ata_piix
 98:         35          0    2902361       PCI-MSI-X  eth1-0
106:         61         11       3841       PCI-MSI-X  eth1-1
114:         28          0      61452       PCI-MSI-X  eth1-2
122:         24       1586         22       PCI-MSI-X  eth1-3
130:       2912          0        337       PCI-MSI-X  eth1-4
138:         21          0         28       PCI-MSI-X  eth1-5
146:         21          0         56       PCI-MSI-X  eth1-6
154:         34          1          1       PCI-MSI-X  eth1-7
209:         23          0          0   IO-APIC-level  ehci_hcd:usb1
217:          0          0          0   IO-APIC-level  ehci_hcd:usb2, uhci_hcd:usb5, uhci_hcd:usb7
225:          0          0          0   IO-APIC-level  uhci_hcd:usb3
233:          0          0          0   IO-APIC-level  uhci_hcd:usb4
NMI:       7615       2989       2931
LOC:  600328144  600328099  600327086
ERR:          0
MIS:          0

Why are there multiple entries for "eth1" in the form of "eth1-X"?

Furthermore, the contents of "/sys/class/net/eth1/device/irq" is "90". But there's no 90 in the interrupt list above.

So let's say I look at just "eth1-0", which is IRQ 98. The contents of /proc/irq/98/smp_affinity is:

00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000

That's a list of numbers, rather than just one number.

So how do I set eth1's smp_affinity?

None of the online examples and documentation I could find mentioned any cases like this; they always have exactly one "ethX" entry in /proc/interrupts; the indicated interrupt matches /sys/class/net/ethX/device/irq; and there is only one number in /proc/irq/N/smp_affinity.

FWIW, I'll add that this application is extremely latency sensitive. To the point where we disable C-states and processor frequency scaling (because those features induce too much latency). Micro seconds make a difference here.

Edit: I stumbled across the following web page http://www.kernel.org/doc/man-pages/online/pages/man7/cpuset.7.html that, although it is about cpuset, it has a section titled "Mask Format", which I assume is the same as what I am seeing in the /proc/irq//smp_affinity file. Quoting:

This format displays each 32-bit word in hexadecimal (using ASCII characters "0" - "9" and "a" - "f"); words are filled with leading zeros, if required. For masks longer than one word, a comma separator is used between words. Words are displayed in big-endian order, which has the most significant bit first. The hex digits within a word are also in big-endian order.

The number of 32-bit words displayed is the minimum number needed to display all bits of the bitmask, based on the size of the bitmask.

Examples of the Mask Format:

   00000001                        # just bit 0 set
   40000000,00000000,00000000      # just bit 94 set
   00000001,00000000,00000000      # just bit 64 set
   000000ff,00000000               # bits 32-39 set
   00000000,000E3862               # 1,5,6,11-13,17-19 set

A mask with bits 0, 1, 2, 4, 8, 16, 32, and 64 set displays as:

   00000001,00000001,00010117

The first "1" is for bit 64, the second for bit 32, the third for bit 16, the fourth for bit 8, the fifth for bit 4, and the "7" is for bits 2, 1, and 0.

Matt
  • 1,037
  • 2
  • 14
  • 20

3 Answers3

3

Why are there multiple entries for "eth1" in the form of "eth1-X"?

Because there are multiple tx/rx queues. These queues are often a hash of (local addr, port, remote addr, port) and some other stuff. Suppressing the multiple queues might make it easier to make your application more deterministic, assuming you have few traffic sources. Or you could look up the algorithm and avoid ephemeral ports, if that's easier.

Brian Cain
  • 299
  • 3
  • 7
2

Are you using a realtime kernel? Are you leveraging cgroups or cpusets to isolate your application? If you're on a stock distribution kernel, you're leaving a good amount of latency gains on the table. Also, I see 16 CPU-cores. That would indicate that HyperThreading is enabled. How do you know if you're binding to a real versus logical core?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • A lot of time and experimentation has passed since I originally asked this question. A realtime kernel is on the list of things to experiment with. We've messed with cpusets (and the related isolcpus kernel commandline parameter). That's a good question about discerning between real and logical cores. The long and the short of it is, we still need to do more research and experimentation! – Matt Apr 13 '12 at 14:22
  • Ideally, hyperthreading should be off if you're setting CPU affinity... Is this a financial application? – ewwhite Apr 13 '12 at 16:12
0

Check if you have a directory /sys/class/net/eth1/device/msi_irqs/. If so, ignore the content of /sys/class/net/eth1/device/irq. This network device has multiple rx/tx queues and therefore multiple IRQs. These IRQs correspond to the names of the files in the /sys/class/net/eth1/device/msi_irqs/ directory.

scai
  • 2,199
  • 1
  • 12
  • 16