I'm trying to understand weird packet drop issue when joining particular multicast group.
I think this issue is related to patch introduced in kernel ver 2.6.37
Beginning with kernel 2.6.37, it has been changed the meaning of dropped
packet count. Before, dropped packets was most likely due to an error.
Now, the rx_dropped counter shows statistics for dropped frames because
of:
Softnet backlog full -- (Measured from /proc/net/softnet_stat)
Bad / Unintended VLAN tags
Unknown / Unregistered protocols
IPv6 frames when the server is not configured for IPv6
If any frames meet those conditions, they are dropped before the
protocol stack and the rx_dropped counter is incremented.
On a clean SLES11 SP3 I managed to reproduce this by joining STP multicast group (01:80:c2:00:00:00).
Without any changes, there's no packet drops in /proc/net/dev
(RX) or netstat -i
because my system has not joined STP multicast group (so ignoring packets).
When I join the STP multicast group I can see packet drops (1 packet every 2 secs) which I believe are dropped due to the patch introduced in kernel 2.6.37 (Unknown/Unregistered Protocols) and this is ok.
hostname:~ # ip maddr add 01:80:c2:00:00:00 dev eth1
My understanding is that When I modprobe llc/stp module into kernel it recognizes the protocol and therefore stops dropping the packets (tests prove that I am right).
Modprobing llc
or stp
module (depends on llc) "fixes" dropped packet issue.
Now, the question:
I have an application that joins multiple multicast groups when started. And for some reason one particular join triggers dropped packet issue (1 packet per 2 seconds).
The problem is, it is not stp multicast address 01:80:c2:00:00:00
but a totally different one (01:00:5e:46:ac:04 aka 239.70.172.4
).
Inserting llc/stp module "fixes" dropped packet counter increment. All other multicast groups do not cause this problem e.g. (01:00:5e:46:ac:02
) and also many others.
STP frames are the only one that appear on the interface every 2 seconds but their destination MAC address is 01:80:c2:00:00:00.
00:21:1b:4f:a3:bf > 01:80:c2:00:00:00, 802.3, length 119: LLC, dsap STP (0x42) Individual, ssap STP (0x42) Command, ctrl 0x03: STP 802.1s, Rapid STP, CIST Flags [Learn, Forward]
How is this possible? Why would 01:00:5e:46:ac:04 multicast group trigger this behaviour, like it would be somehow related to STP group and let frames/packets pass further through the stack?