9
root@openwrt:~# ip -s -s -4 neigh show dev lan
10.64.42.121 lladdr b8:20:00:00:00:00 used 6387/6341/6313 probes 1 STALE
10.64.42.157 lladdr b8:20:00:00:00:00 used 24/813/19 probes 1 STALE
10.64.42.12  used 29066/30229/29063 probes 6 FAILED
10.64.42.1 lladdr e8:00:00:00:00:00 ref 1 used 10/5/5 probes 1 REACHABLE


root@openwrt:~# cat /proc/sys/net/ipv4/neigh/default/gc_interval 
30

root@openwrt:~# cat /proc/sys/net/ipv4/neigh/default/gc_stale_time
60

root@openwrt:~# cat /proc/sys/net/ipv4/neigh/lan/gc_stale_time
60

A host in the lan (b8:20:00:00:00:00) had IP address 10.64.42.121. This IP is now invalid and this same host's IP is now 10.64.42.157 (new DHCP lease).

I try to figure out when the old arp cache entry will change state to FAILED (providing nobody attempts to contact the IP).

The last time the entry has been confirmed is 6341s ago (1h45 ago). This is greater than 60s. Why is this entry still in STALE state and when will it change to FAILED state (or be deleted) (if nobody ever tries to use the entry)?

Fox
  • 952
  • 2
  • 12
  • 21
  • Is `b8:20:00:00:00:00` the real mask, or did you obfuscate it? Because it's unlikely to be a real MAC. If this is really the mac from the results, someone is overriding the hardware MAC somewhere, and if they're doing that, is it possible this was a _static_ ARP entry? – Joel Coel Mar 22 '16 at 17:21
  • It's not the real MAC. I obfuscated it. :) – Fox Mar 22 '16 at 21:33

4 Answers4

7

The neighbor cache in the Linux kernel isn't as simple.

There are subtle differences between an neighbor cache entry actually falling out of the cache entirely or just being marked as stale/invalid. At some point between base_reachable_time/2 and 3* base_reachable_time/2, the entry will still be in the cache, but it will be marked with a state of STALE. You should be able to view the state with "ip -s neighbor show".

When in the STALE state like show above, if I ping 10.64.42.121, it will send the packet to b8:20:00:00:00:00 right away. A second or so later it will usually send an ARP request for who has 10.64.42.121 in order to update it's cache back to a REACHABLE state. BUT, to make matters more confusing, the kernel will sometimes change timeout values based on positive feedback from higher level protocols. What this means is that if I ping 10.64.42.121 and it replies, then the kernel might not bother sending an ARP request because it assumes that the pong meant that it's ARP cache entry is valid. If the entry is in the STALE state, it will also be updated by unsolicited ARP replies that it happens to see.

Now, for the majority of cases, the entry being in the STALE state is all you need to worry about. Why do you need the entry to be removed from the cache entirely? The kernel goes to a lot of effort to not thrash memory by just changing the state of cache entries instead of actually removing and adding them to the cache all the time.

If you really really insist that it not only will be marked as STALE, but will actually be removed from the hashmap used by the neighbor cache, you have to beware of a few things. First, if the entry hasn't been used and is stale for gc_stale_time seconds, it should be eligible to be removed. If gc_stale_time passed and marked the entry as okay to be removed, it will be removed when the garbage collector runs (usually after gc_interval seconds).

Now the problem is that the neighbor entry will not be deleted if it's being referenced. The main thing that you're going to have problems with is the reference from the ipv4 routing table. There's a lot of complicated garbage collection stuff, but the important thing to note is that the garbage collector for the route cache only expires entries every 5 minutes (/proc/sys/net/ipv4/route/gc_timeout seconds) on a lot of kernels. This means the neighbor entry will have to be marked as stale (maybe 30 seconds, depending on base_reachable_time), then 5 minutes will have to go by before the route cache stops referencing the entry (if you're lucky), followed by some combination of gc_stale_time and gc_interval passing before it actually gets cleaned up (so, overall, somewhere between 5-10 minutes will pass).

Summary: you can try decreasing /proc/sys/net/ipv4/route/gc_timeout to a shorter value, but there are a lot of variables and it's difficult to control them all. There's a lot of effort put in to making things perform well by not removing entries in the cache too early (but instead just marking them as STALE or even FAILED).

  • Thank you for your thorough answer. However, I'm not talking about minutes. I'm talking about hours, days, weeks. What I've noticed on my router is that entries in the ARP table seem to never be deleted, no matter if FAILED or STALE. I'm aware of the concerns for a kernel to be efficient and minimize useless operations. But the thing is, on my router, the garbage collection does not seem to happen. If an entry is STALE and I unplug the Ethernet cable of that machine in the LAN, one week later, the entry is still in the table in STALE state (provided nobody tries to talk to it). – Fox Apr 21 '16 at 09:09
  • from: https://stackoverflow.com/questions/15372011/configuring-arp-age-timeout/15511117#15511117 – A.B Feb 13 '18 at 10:05
3

gc_stale_time is the right parameter to tweak to evict STALE entries from the ARP table. But there is more:

ARP garbage collection is run in the periodic neigh_periodic_work function. The interval can be tweaked via /proc/sys variable gc_interval.

It will then check that there is at least gc_thresh1 entries in the ARP table. This will avoid consuming extra CPU cycles if the table is too small to see any real benefit in terms of memory.

In your case, I suspect gc_thresh1 is the variable you'll want to tweak. lowering it will force the GC to run more frequently. This may have a negative impact on the performance depending on the run interval though.

Note: gc_thresh3 is a hard threshold. The table will never keep more entries than this value. Tweak it with care.

yadutaf
  • 464
  • 3
  • 12
0

Kernel.org's doc indicates that

route/max_size - INTEGER
    Maximum number of routes allowed in the kernel.  Increase
    this when using large numbers of interfaces and/or routes.
    From linux kernel 3.6 onwards, this is deprecated for ipv4
    as route cache is no longer used.

/proc/sys/net/ipv4/route/gc_timeout is quite different from the neigh table in its implication and route caching is not used anymore for ip4. If you do a sysctl net.ipv4.route.gc_thresh you will probably see that it's set to -1

Thomas
  • 4,155
  • 5
  • 21
  • 28
Arnaud
  • 111
  • 1
  • 4
0

in function neigh_periodic_work, there is code below:

if (atomic_read(&tbl->entries) < tbl->gc_thresh1)
        goto out;
out:
    /* Cycle through all hash buckets every BASE_REACHABLE_TIME/2 ticks.
     * ARP entry timeouts range from 1/2 BASE_REACHABLE_TIME to 3/2
     * BASE_REACHABLE_TIME.
     */
    schedule_delayed_work(&tbl->gc_work,
                  NEIGH_VAR(&tbl->parms, BASE_REACHABLE_TIME) >> 1);
    write_unlock_bh(&tbl->lock);

if the number of neighbors less than gc_thresh1,then goto out, the task of gc is delayed, so it can not delete the STALE and FAILED neighbor tables,you may modify the value of /proc/sys/net/ipv4/neigh/default/gc_thresh1, default 128 for kernel 3.10.0-327.36.3

alexander.polomodov
  • 1,060
  • 3
  • 10
  • 14
cing
  • 1