72

How can I passively monitor the packet loss on TCP connections to/from my machine?

Basically, I'd like a tool that sits in the background and watches TCP ack/nak/re-transmits to generate a report on which peer IP addresses "seem" to be experiencing heavy loss.

Most questions like this that I find of SF suggest using tools like iperf. But, I need to monitor connections to/from a real application on my machine.

Is this data just sitting there in the Linux TCP stack?

user9517
  • 114,104
  • 20
  • 206
  • 289
nonot1
  • 1,069
  • 1
  • 12
  • 16

8 Answers8

63

For a general sense of the scale of your problem netstat -s will track your total number of retransmissions.

# netstat -s | grep retransmitted
     368644 segments retransmitted

You can aso grep for segments to get a more detailed view:

# netstat -s | grep segments
         149840 segments received
         150373 segments sent out
         161 segments retransmitted
         13 bad segments received

For a deeper dive, you'll probably want to fire up Wireshark.

In Wireshark set your filter to tcp.analysis.retransmission to see retransmissions by flow.

That's the best option I can come up with.

Other dead ends explored:

  • netfilter/conntrack tools don't seem to keep retransmits
  • stracing netstat -s showed that it is just printing /proc/net/netstat
  • column 9 in /proc/net/tcp looked promising, but it unfortunately appears to be unused.
Zarathustra
  • 103
  • 4
Joel K
  • 5,765
  • 2
  • 29
  • 34
  • 1
    and you can monitor the lossed packets with # watch 'netstat -s | grep retransmited' – none Oct 13 '11 at 11:53
  • This would show only outbound problems. "netstat -s | grep segments" appears more reasonable to me. – akostadinov Apr 17 '12 at 19:39
  • 1
    If you're managing a reasonable sized network, then I'd recommend pastmon over wireshark for continuous monitoring - http://pastmon.sourceforge.net/Wikka-1.1.6.5/wikka.php?wakka=HomePage – symcbean Nov 02 '12 at 11:12
  • 6
    For some reason, it's spelled `retransmited` for me (Ubuntu Server 14). – sudo Apr 13 '17 at 15:50
  • 3
    what's a good rate for retransmissions vs sent or received ? – abourget May 31 '17 at 18:31
  • `netstat` doesn't need root for this, BTW. `/proc/net/netstat` is publicly readable. – Peter Cordes Mar 20 '18 at 05:16
  • what time period is displayed with the -s option?? Like is it showing stats for last 5 minutes, or since last reboot, or something else? – jake Oct 08 '20 at 14:29
15

These stats are in /proc/net/netstat and collectl will monitor them for you either interactively or written to disk for later playback:

[root@poker ~]# collectl -st
waiting for 1 second sample...
#<------------TCP------------->
#PureAcks HPAcks   Loss FTrans
        3      0      0      0
        1      0      0      0

Of course, if you'd like to see then side-by-side with network traffic, just include n with -s:

[root@poker ~]# collectl -stn
waiting for 1 second sample...
#<----------Network----------><------------TCP------------->
#  KBIn  PktIn  KBOut  PktOut PureAcks HPAcks   Loss FTrans
      0      1      0       1        1      0      0      0
      0      1      0       1        1      0      0      0
Mark Seger
  • 171
  • 2
8

You can use the ss tool to get detailed TCP statistics:

$ /sbin/ss -ti

Under Debian, use apt-get install iproute to get the binary.

Greg
  • 167
  • 4
otmar
  • 91
  • 1
  • 1
  • Note that the person asking the question was looking for a tool that they could watch the output of. While some of the commands mentioned so far don't operate this way, all of the upvoted answers included at least one method for doing so. – Andrew B Apr 06 '13 at 20:06
  • 2
    @AndrewB: You can do `watch ss -ti`. – John Zwinck Jan 22 '15 at 05:53
  • This doesn't include packet loss/retransmit counts in my version of `ss`. I had to use `ss --tcp --options`. That gives you a column at the end which looks like this: `timer:(keepalive,17sec,0)` The retransmission count is the last number; "0" in my case. – Aaron Digulla Mar 19 '22 at 11:16
3

It looks like some guys at the University of North Carolina (UNC) built a utility to investigate exactly this:

Methodology

TCP is a classic example of a legacy protocol that gets subject to modifications. Unfortunately, evaluation of something as fundamental as TCP's loss detection/recovery mechanism is not comprehensive. Our aim is to perform a complete realistic evaluation of TCP losses and its impact on TCP performance.

I rely on passive analysis of real-world TCP connections to achieve the required level of detail and realism in my analysis.

http://www.cs.unc.edu/~jasleen/Research-passivetcp.htm#Tool

Tool

The purpose of the tool is to provide more complete and accurate results for identifying and characterizing out-of-sequence segments than those provided by prior tools such as tcpanaly, tcpflows, LEAST, and Mystery. Our methodology classifies each segment that appears out-of-sequence (OOS) in a packet trace into one of the following categories: network reordering or TCP retransmission triggered by one of timeout, duplicate ACKs, partial ACKs, selective ACKs, or implicit recovery. Further, each retransmission is also assessed for whether it was needed or not.

I won't say it is production quality. Previously I've built quick perl scripts to store ip/port/ack tuples in memory and then report on duplicated data from scanning pcap output, this looks like it provides more thorough analysis.

polynomial
  • 3,968
  • 13
  • 24
3

You may want to look at the dropwatch utility.

charleswj81
  • 2,433
  • 14
  • 18
0

Looks like /proc/net/snmp is where the values for netstat -s are sourced. So here is quick gawk script to find the % of segments that are retransmitted:

gawk 'BEGIN {OFS=" "} $1 ~ /Tcp:/ && $2 !~ /RtoAlgorithm/ {print "InSegs\t",$11,"\nOutSegs\t",$12,"\nRetransSegs\t",$13,"\nPctReTrans\t",($13/$12*100)}' /proc/net/snmp

InSegs   8567261339 
OutSegs  9545034903 
RetransSegs  2192165 
PctReTrans   0.0229665

An internal (no public IP or public traffic) AWS instance which we suspected was having networking issues with other systems in the VPC showed 0.0229% retransmitted, which was over 10 times higher than the 0.002% max we saw on other nodes. One really bad instance got as high as 2.32% of all outbound packets were retransmited segments.

You can also see the rate of retransmits during a given time window using:

FIRST=$(netstat -s | grep -oP \'\d+(?= segments retransmit+ed)\');
sleep 30;
LAST=$(netstat -s | grep -oP \'\d+(?= segments retransmit+ed)\');
expr $LAST - $FIRST;
Greg Bray
  • 5,530
  • 5
  • 33
  • 52
  • looks like nstat is another command for monitoring those counters https://loicpefferkorn.net/2016/03/linux-network-metrics-why-you-should-use-nstat-instead-of-netstat/ `nstat --nooutput;sleep 60;nstat -p TcpRetransSegs TcpExtTCPLostRetransmit TcpExtTCPSynRetrans` – Greg Bray Aug 13 '20 at 01:10
0

In recent Linux versions, netstat has been replace with ss and ip. Another answer explains how to use ss. With ip, you can get the number of dropped packets with this command:

ip -s link show eth0

See also:

Aaron Digulla
  • 954
  • 1
  • 13
  • 24
0

Apparently good old sar can gather retransmission (and other tcp statistics), along with all kinds of other system statistics that might also be interesting if you investigate a problem like cpu, memory, disk I/O, etc.

You may need to install a package: sysstat and enable this particular kind of statistics with the switch -S SNMP, on RHEL/OracleLinux this is configure in /etc/cron.d/sysstat where /usr/lib64/sa/sa1 is invoked every 5 minutes by default, but that can be tuned also.

For analysis of this data use:

  • sar (command line, text based)
  • sadf creates SVG according to http://sebastien.godard.pagesperso-orange.fr/matrix.html
  • ksar (that can plot nice graphs and runs on Java - there are several different clones around from which to choose on sf.net and github if I recall correctly)
  • http://www.sargraph.com (based on PHP, with which I have no experience with whatsoever - mind you, the application, not the programming language )
JohannesB
  • 201
  • 1
  • 4