3

My datanodes in my hbase cluster are triggering some tcp rcvpruned and backlog drops from time to time:

enter image description here

It seems to be there are at least two angles to approach this at:

  1. Tune HBase/HDFS etc... so that these are not triggered
  2. Tune the Linux network stack to be able to handle these

I'm interested in understand these two metrics more and on any actionable advice on either of these two paths. Can anyone advise on concrete next steps?

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444

1 Answers1

2

tcp_v4_rcv [0] calls sk_add_backlog and if it fails it increments TCPBacklogDrop

2014         } else if (unlikely(sk_add_backlog(sk, skb,
2015                                            sk->sk_rcvbuf + sk->sk_sndbuf))) {
2016                 bh_unlock_sock(sk);
2017                 NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP);
2018                 goto discard_and_relse;
2019         }

sk_add_backlog fails only if sk_rcvqueues_full [1]:

801 /* The per-socket spinlock must be held here. */
802 static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb,
803                                               unsigned int limit)
804 {
805         if (sk_rcvqueues_full(sk, skb, limit))
806                 return -ENOBUFS;
807 
808         __sk_add_backlog(sk, skb);
809         sk->sk_backlog.len += skb->truesize;
810         return 0;
811 }

Underlying function __sk_add_backlog was recently [2] to allow at least one packet to pass:

+ * Do not take into account this skb truesize,
+ * to allow even a single big packet to come.

I suppose applying that patch to your kernel should fix the problem. Also you may try to increase default rcv buffer size in both OS and application (setsockopt SO_RCVBUF)

And your second question about RcvPruned - Linux increments that stat inside tcp_prune_queue[3]. That function is usually called when socket goes over it's rcv limits. So yet again you can either increase your rmem/SO_RCVBUF and/or tune your application to make read() calls more frequently (I assume your drops are closely correlated with Java's Stop-The-World GC pauses. So, tune your GC).

[0] http://lxr.free-electrons.com/source/net/ipv4/tcp_ipv4.c?v=3.15#L2014
[1] http://lxr.free-electrons.com/source/include/net/sock.h?v=3.15#L802
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0fd7bac6b6157eed6cf0cb86a1e88ba29e57c033
[3] http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c?v=3.15#L4662

SaveTheRbtz
  • 5,621
  • 4
  • 29
  • 45
  • Fantastic answer. I'll have jvm GC monitoring in place for my region servers sometime tomorrow and I'll look for the correlation and go from there. – Kyle Brandt Sep 21 '14 at 17:06