How to troubleshoot latency between 2 linux hosts

Question

The latency between 2 linux hosts is about .23ms. They are connected by one switch. Ping & Wireshark confirm the latency number. But, i dont have any visibility into what is causing this latency. How can i know if the latency is due to NIC on host A or B or the switch or the cables?

UPDATE: The .23 ms latency is bad for my existing application, which sends messages at very high frequency and i am trying to see if it can be brought down to .1ms

Why do you think .23ms is bad latency? That's awesome latency. — SpacemanSpiff, Nov 03 '12 at 15:03
Connect them directly with a crossover cable. If you have the same latency then the cause is one of the hosts. If you don't have the same latency then the cause is the switch or the cabling. — joeqwerty, Nov 03 '12 at 15:03
Agreed, what's the problem? 0.23ms latency is less than I get with two machines sitting next to each other. — Michael Hampton, Nov 03 '12 at 15:08
@joeqwerty If two systems are connected via crossover cable, how do they locate each other? Does ARP still work? Does TCP still work? — Jimm, Nov 03 '12 at 15:28
They'll work just the same as if they were both connected to the same switch. The cable is merely the physical medium over which they'll communicate. All 7 layers of the OSI model (or the 4 layers of the DARPA model, if you prefer) will work exactly as they do now. — joeqwerty, Nov 03 '12 at 15:29
So, we need more detail about what you're trying to do. Is this a messaging application? Trading, perhaps? Please reply with your network setup, server type (manufacturer/model) and Linux distribution and kernel version. Also, is this UDP or TCP? As-is, we cannot help. I'll add a generic answer for now, but will fill more in as your provide details. — ewwhite, Nov 03 '12 at 15:45

ewwhite · Accepted Answer · 2012-11-03T16:53:29.780

Generically, you can use some of the advanced switches to the iperf utility to get a view of the network performance between systems, specifically latency and jitter...

Is this a UDP or TCP-based message stream?

I commented above on needing more information about your setup. If this is a low latency messaging application, there's a whole world of tuning and optimization techniques that span hardware, driver and OS tweaking. But really, we need more information.

Edit:

Okay, so this is TCP messaging. Have you modified any /etc/sysctl.conf parameters? What do your send/receive buffers look like? Using a realtime kernel alone won't do much, but if you move to the point where you're binding interrupts to CPU's, changing the realtime priority of the messaging app (chrt) and possibly modifying the tuned-adm profile of the system may help...

This sounds to be a generic EL6 system, so an easy way to set a performance tuning baseline involves changing the system's performance profile to another one available within the tuned framework. Then build from there.

In your case:

yum install tuned tuned-utils
tuned-adm profile latency-performance

A quick matrix showing the differences:

Can you tell us about the hardware? Types of CPU, NIC, memory?

So, it may be interesting to test your link... Try this iperf test...

On one system, start an iperf UDP listener. On the other, open a connection to the first... A quick line-quality test.

# Server2
[root@server2 ~]# iperf -su   

# Server1
[root@server1 ~]# iperf -t 60 -u -c server2

In my case, low jitter and low ping time:

------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size:  224 KByte (default)
------------------------------------------------------------
[  3] local 192.168.15.3 port 5001 connected with 172.16.2.152 port 36312
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams
[  3]  0.0-20.0 sec  2.50 MBytes  1.05 Mbits/sec   0.012 ms    0/ 1785 (0%)

PING server1 (172.16.2.152) 56(84) bytes of data.
64 bytes from server1 (172.16.2.152): icmp_seq=1 ttl=63 time=0.158 ms
64 bytes from server1 (172.16.2.152): icmp_seq=2 ttl=63 time=0.144 ms

I'd check the hardware and interfaces for errors. If you want, eliminate the switch between systems and see what a direct connection looks like. You don't want high jitter (variance), so check that.

But honestly, even with the ping times you're getting on your current setup, that should not be enough to kill your application. I'd go down the path of tuning your send/receive buffers. See: net.core.rmem_max, net.core.wmem_max and their defaults...

Something like the following in /etc/sysctl.conf (please tune to taste):

net.core.rmem_default = 10000000
net.core.wmem_default = 10000000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

It is a latency sensitive messaging application. The typical OS would be kernel-2.6.32-279.11.1.el6.x86_64, though I loaded hosts with kernel 3.2.23-rt37.56.el6rt.x86_64 to see if that would make any difference. But it was pretty much the same. Message sizes vary between 1KB- 3KB. All communication happens via TCP. — Jimm, Nov 03 '12 at 15:57
Right now its plain Redhat 6.3, but MRG is also a possibility. As i mentioned above, i tried both, but latency was the same. What kind of tunables should i be concerned with? — Jimm, Nov 03 '12 at 16:04
I'd want to know the hardware and NIC setup. Switch model helps. For tunables, the obvious area to look at on 6.3 is your `tuned-adm` profile. — ewwhite, Nov 03 '12 at 16:05
Dual Ethernet Controllers : Emulex Corporation OneConnect 10Gb NIC (rev 02) and 16 core AMD Family 10h Processors, each 2400 MHz. — Jimm, Nov 03 '12 at 16:11
Okay, 4900M and 10GbE. Standard stuff... I don't think ping time matters here. Try increasing the buffers and trying your app again. — ewwhite, Nov 03 '12 at 16:53

How to troubleshoot latency between 2 linux hosts

1 Answers1

Linked