4

We got this Netdata alarm

system.softnet_stat number of times, during the last 10min, ksoftirq ran out of sysctl net.core.netdev_budget or net.core.netdev_budget_usecs, with work remaining (this can be a cause for dropped packets)

I have been searching for information on how to resolve this issue. Everyone suggest increasing netdev_budget and/or netdev_budget_usecs, but many sources contradict each other on how the limits should be set. Some suggest we should increase netdev_budget to around 30K events, some to 600 events. Our config /etc/sysctl.conf has everything commented out, I guess all the values are default?

Our daily average event count is 10K-20K. In system.softnet_stat chart we can see that squeezed events exist even when the processed event count is only 2K.

In short, how do we calculate what values should we assign to netdev_budget and/or netdev_budget_usecs?

Martin
  • 41
  • 2

1 Answers1

0

There is no one-fits-all answer for this problem. In general, you should set higher values in sysctl.conf until you find something that works; however it is also possible the machine is receiving more packets than it can handle, so there may be no values that would work. Based on https://github.com/netdata/netdata/issues/1076 and https://nateware.com/2013/04/06/linux-network-tuning-for-2013/ here is a sample config that users have reported to work:

# /etc/sysctl.d/99-network-tuning.conf

# http://www.nateware.com/linux-network-tuning-for-2013.html
# Increase Linux autotuning TCP buffer limits
# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE
# Don't set tcp_mem itself! Let the kernel scale it based on RAM.
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960

# cloudflare uses this for balancing latency and throughput
# https://blog.cloudflare.com/the-story-of-one-latency-spike/
## net.ipv4.tcp_rmem = 4096 1048576 2097152
net.ipv4.tcp_rmem = 4096 5242880 33554432

net.ipv4.tcp_wmem = 4096 65536 16777216

# Also increase the max packet backlog
net.core.netdev_max_backlog = 100000
## net.core.netdev_budget = 50000
net.core.netdev_budget = 60000
net.core.netdev_budget_usecs = 6000

# Make room for more TIME_WAIT sockets due to more clients,
# and allow them to be reused if we run out of sockets
net.ipv4.tcp_max_syn_backlog = 30000
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 10

# Disable TCP slow start on idle connections
net.ipv4.tcp_slow_start_after_idle = 0

# If your servers talk UDP, also up these limits
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 8192