16

The latency between 2 linux hosts is about .23ms. They are connected by one switch. Ping & Wireshark confirm the latency number. But, i dont have any visibility into what is causing this latency. How can i know if the latency is due to NIC on host A or B or the switch or the cables?

UPDATE: The .23 ms latency is bad for my existing application, which sends messages at very high frequency and i am trying to see if it can be brought down to .1ms

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Jimm
  • 303
  • 1
  • 4
  • 11
  • 2
    Why do you think .23ms is bad latency? That's awesome latency. – SpacemanSpiff Nov 03 '12 at 15:03
  • 6
    Connect them directly with a crossover cable. If you have the same latency then the cause is one of the hosts. If you don't have the same latency then the cause is the switch or the cabling. – joeqwerty Nov 03 '12 at 15:03
  • 1
    Agreed, what's the problem? 0.23ms latency is less than I get with two machines sitting next to each other. – Michael Hampton Nov 03 '12 at 15:08
  • @joeqwerty If two systems are connected via crossover cable, how do they locate each other? Does ARP still work? Does TCP still work? – Jimm Nov 03 '12 at 15:28
  • 1
    They'll work just the same as if they were both connected to the same switch. The cable is merely the physical medium over which they'll communicate. All 7 layers of the OSI model (or the 4 layers of the DARPA model, if you prefer) will work exactly as they do now. – joeqwerty Nov 03 '12 at 15:29
  • So, we need more detail about what you're trying to do. Is this a messaging application? Trading, perhaps? Please reply with your network setup, server type (manufacturer/model) and Linux distribution and kernel version. Also, is this UDP or TCP? As-is, we cannot help. I'll add a generic answer for now, but will fill more in as your provide details. – ewwhite Nov 03 '12 at 15:45
  • @jimm Did any of this help? – ewwhite Nov 04 '12 at 00:35

1 Answers1

15

Generically, you can use some of the advanced switches to the iperf utility to get a view of the network performance between systems, specifically latency and jitter...

Is this a UDP or TCP-based message stream?

I commented above on needing more information about your setup. If this is a low latency messaging application, there's a whole world of tuning and optimization techniques that span hardware, driver and OS tweaking. But really, we need more information.

Edit:

Okay, so this is TCP messaging. Have you modified any /etc/sysctl.conf parameters? What do your send/receive buffers look like? Using a realtime kernel alone won't do much, but if you move to the point where you're binding interrupts to CPU's, changing the realtime priority of the messaging app (chrt) and possibly modifying the tuned-adm profile of the system may help...

This sounds to be a generic EL6 system, so an easy way to set a performance tuning baseline involves changing the system's performance profile to another one available within the tuned framework. Then build from there.

In your case:

yum install tuned tuned-utils
tuned-adm profile latency-performance

A quick matrix showing the differences:

Can you tell us about the hardware? Types of CPU, NIC, memory?

So, it may be interesting to test your link... Try this iperf test...

On one system, start an iperf UDP listener. On the other, open a connection to the first... A quick line-quality test.

# Server2
[root@server2 ~]# iperf -su   

# Server1
[root@server1 ~]# iperf -t 60 -u -c server2

In my case, low jitter and low ping time:

------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size:  224 KByte (default)
------------------------------------------------------------
[  3] local 192.168.15.3 port 5001 connected with 172.16.2.152 port 36312
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams
[  3]  0.0-20.0 sec  2.50 MBytes  1.05 Mbits/sec   0.012 ms    0/ 1785 (0%)

PING server1 (172.16.2.152) 56(84) bytes of data.
64 bytes from server1 (172.16.2.152): icmp_seq=1 ttl=63 time=0.158 ms
64 bytes from server1 (172.16.2.152): icmp_seq=2 ttl=63 time=0.144 ms

I'd check the hardware and interfaces for errors. If you want, eliminate the switch between systems and see what a direct connection looks like. You don't want high jitter (variance), so check that.

But honestly, even with the ping times you're getting on your current setup, that should not be enough to kill your application. I'd go down the path of tuning your send/receive buffers. See: net.core.rmem_max, net.core.wmem_max and their defaults...

Something like the following in /etc/sysctl.conf (please tune to taste):

net.core.rmem_default = 10000000
net.core.wmem_default = 10000000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • It is a latency sensitive messaging application. The typical OS would be kernel-2.6.32-279.11.1.el6.x86_64, though I loaded hosts with kernel 3.2.23-rt37.56.el6rt.x86_64 to see if that would make any difference. But it was pretty much the same. Message sizes vary between 1KB- 3KB. All communication happens via TCP. – Jimm Nov 03 '12 at 15:57
  • Is the OS Red Hat MRG? – ewwhite Nov 03 '12 at 15:59
  • Right now its plain Redhat 6.3, but MRG is also a possibility. As i mentioned above, i tried both, but latency was the same. What kind of tunables should i be concerned with? – Jimm Nov 03 '12 at 16:04
  • I'd want to know the hardware and NIC setup. Switch model helps. For tunables, the obvious area to look at on 6.3 is your `tuned-adm` profile. – ewwhite Nov 03 '12 at 16:05
  • Dual Ethernet Controllers : Emulex Corporation OneConnect 10Gb NIC (rev 02) and 16 core AMD Family 10h Processors, each 2400 MHz. – Jimm Nov 03 '12 at 16:11
  • What's the switch? – ewwhite Nov 03 '12 at 16:25
  • Cisco catalyst 4900 – Jimm Nov 03 '12 at 16:47
  • Okay, 4900M and 10GbE. Standard stuff... I don't think ping time matters here. Try increasing the buffers and trying your app again. – ewwhite Nov 03 '12 at 16:53