1

k8s environment(4 nodes, rke 1.21.5)

We noticed there is randomly significant latency in socket data transferring between different k8s pods. Latency could be as long as 15 seconds in some cases.

By analysing tcpdump, we found that in some cases, the server side took quite long time to reply an ACK to client side.

Here is server-side TCP dump for one socket: 10.42.40.2:51702(client side) <-> 10.42.18.64:9099(server side)

13:24:05.485173 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1353130:1361578, ack 615, win 3068, options [nop,nop,TS val 497835713 ecr 1074832978], length 8448
**13:24:05.489375 IP 10.42.18.64.9099 > 10.42.40.2.51702: Flags [.], ack 1330602, win 1285, options [nop,nop,TS val 1074832982 ecr 497835641], length 0**
13:24:05.489420 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1361578:1362986, ack 615, win 3068, options [nop,nop,TS val 497835717 ecr 1074832982], length 1408
13:24:05.489424 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1362986:1367210, ack 615, win 3068, options [nop,nop,TS val 497835717 ecr 1074832982], length 4224
13:24:05.489482 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1367210:1412266, ack 615, win 3068, options [nop,nop,TS val 497835717 ecr 1074832982], length 45056
13:24:05.489532 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [P.], seq 1412266:1420714, ack 615, win 3068, options [nop,nop,TS val 497835717 ecr 1074832982], length 8448
13:24:05.489534 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1420714:1457322, ack 615, win 3068, options [nop,nop,TS val 497835717 ecr 1074832982], length 36608
13:24:05.489573 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1457322:1472810, ack 615, win 3068, options [nop,nop,TS val 497835717 ecr 1074832982], length 15488
13:24:05.489688 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1472810:1475626, ack 615, win 3068, options [nop,nop,TS val 497835717 ecr 1074832982], length 2816
13:24:05.489741 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1475626:1485482, ack 615, win 3068, options [nop,nop,TS val 497835717 ecr 1074832982], length 9856
13:24:05.671207 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1485482:1486890, ack 615, win 3068, options [nop,nop,TS val 497835899 ecr 1074832982], length 1408
13:24:06.155201 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1330602:1332010, ack 615, win 3068, options [nop,nop,TS val 497836383 ecr 1074832982], length 1408
13:24:07.115190 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1330602:1332010, ack 615, win 3068, options [nop,nop,TS val 497837343 ecr 1074832982], length 1408
13:24:09.003161 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1330602:1332010, ack 615, win 3068, options [nop,nop,TS val 497839231 ecr 1074832982], length 1408
13:24:12.843114 IP 10.42.40.2.51702 > 10.42.18.64.9099: Flags [.], seq 1330602:1332010, ack 615, win 3068, options [nop,nop,TS val 497843071 ecr 1074832982], length 1408
**13:24:17.658342 IP 10.42.18.64.9099 > 10.42.40.2.51702: Flags [.], ack 1361578, win 1255, options [nop,nop,TS val 1074845151 ecr 497835713], length 0**
13:24:17.658357 IP 10.42.18.64.9099 > 10.42.40.2.51702: Flags [.], ack 1486890, win 1133, options [nop,nop,TS val 1074845151 ecr 497835717,nop,nop,sack 1 {1330602:1332010}], length 0
13:24:17.658360 IP 10.42.18.64.9099 > 10.42.40.2.51702: Flags [.], ack 1486890, win 1133, options [nop,nop,TS val 1074845151 ecr 497835717,nop,nop,sack 1 {1330602:1332010}], length 0
13:24:17.658365 IP 10.42.18.64.9099 > 10.42.40.2.51702: Flags [.], ack 1486890, win 1133, options [nop,nop,TS val 1074845151 ecr 497835717,nop,nop,sack 1 {1330602:1332010}], length 0

According to the tcpdump, server(9099) replied an ack 1330602 at 13:24:05.489375. Then, in the following 12 seconds, it didn't reply any ack until 13:24:17.658342. I think this blocked the socket and prevented client side to transfer more bytes. After 13:24:17, the socket turned back to normal and bytes continued to flush into it.

We've tried TCP_NODELAY and TCP_QUICKACK, it doesn't resolve the issue. (And I don't think this ) Could you suggest any reason it would take so long time to reply a TCP ACK here?

skyfire
  • 11
  • 1

0 Answers0