I have very busy Web servers, and wanted to introduce some analysis to see what kind of traffic present. Namely, total number of all connections, number of time waits, established connections, udp and tcp connections.
First, I made a graph simple - displaying only total number of connections by reading /proc/sys/net/netfilter/nf_conntrack_count
with:
$ cat /proc/sys/net/netfilter/nf_conntrack_count
1994
Everything was nicely presented in the graph, so I introduced more details into it. Now processing /proc/net/nf_conntrack
with similar commands and placing to appropriate monitoring:
$ grep -c tcp /proc/net/nf_conntrack
1273
$ grep -c udp /proc/net/nf_conntrack
49
Made this analysis of nf_conntrack to run every minute. Initially everything was properly displayed so I left it for a day.
Next day I noticed huge drops and re-bounces in total connection count (/proc/sys/net/netfilter/nf_conntrack_count
) which were not normal for Web server occurring every couple of minutes. After many testing and troubleshooting I finally pinpointed the reason behind mystery.
I've put in terminal watch -n0 "cat /proc/sys/net/netfilter/nf_conntrack_count"
(to check in near real-time number of connections) and in second I did only cat /proc/net/nf_conntrack
, and as soon as enter was pressed nf_conntrack_count
dropped hugely from 1993 to 1411, and then it recovered back, in 2-3s, to "normal" value. Tried with cp
, grep
, conntrack -L -p tcp
, etc. and each time I ran command there was this drop.
Basically, every time there were reading of /proc/net/nf_conntrack
- huge, temporary, drop in /proc/sys/net/netfilter/nf_conntrack_count
happened and monitoring sometimes pick low value(s) and represent it in the graph.
Further, I've noticed that there is huge difference in results from cat nf_conntrack
and conntrack -L
. Also, number of lines in nf_conntrack differs from nf_conntrack_count. Kernel is v4.19.5. Everything is so visible with these two commands, deployed three seconds apart:
[07:30:14] root@web1(~)$ wc -l /proc/net/nf_conntrack; \
cat /proc/sys/net/netfilter/nf_conntrack_count
1236 /proc/net/nf_conntrack
1575
[07:30:18] root@web1(~)$ cat /proc/sys/net/netfilter/nf_conntrack_count;\
wc -l /proc/net/nf_conntrack
2009
1191 /proc/net/nf_conntrack
My question is what is exactly going on here, why this is happening (the drop), why there is difference between in listed files, and how to prevent this drop?