0

We're investigating an issue with our kit, which is an IP camera running Busybox Linux on a Ti-Davinci SoC.

On one particular site there is a lot of network traffic (over which we do not have control) with one system spamming out broadcast packets (to 255.255.255.255) every ~50ms non-stop. It's not sending much data but it's very persistent.

This has a strange effect on our system - if the system starts, re-starts, or the networking re-starts (ifdown ... ifup etc.) while this traffic is present, the network interface fails to work properly. It claims to be up but simply sits there logging thousands of Rx overruns / frame errors. We don't successfully receive anything (PINGs etc.) addressed to us and can't send anything either, PINGs fail as if they've gone missing (they show as transmitted successfully but never actually leave us) rather than being actively rejected / dropped. The networking driver seems to believe it's up & working OK but nothing gets in or out.

If we remove the traffic and start / re-start / cycle the networking, the interface comes up and works perfectly - if we then re-introduce the traffic we then do NOT see the overruns / frame errors racking up.

Being a small embedded system running Busybox we have neither the horsepower nor the full range of networking tools at our disposal for some of the heavier-duty suggestions found whilst searching (increasing Rx buffers being the main one I found).

Rather, I'm looking for suggestions on the root cause and/or suggestions on where to start poking about to try and prevent this from happening. It may be as simple as tweaking kernel parameters or rebuilding the networking driver - answers on a postcard!

If more info is needed please ask - this doesn't generate any errors other than the Rx Overflow/Frame stats in ifconfig so nothing really worth posting.

John U
  • 161
  • 4
  • I'm voting to move this question to [unix.se] since it's more about how unix works than about systems administration. This does not mean that the question is a bad one, but I believe you will get better responses from that site. – Jenny D Feb 01 '16 at 14:31
  • No problem, I was unsure which of the stacks this would be best suited to. – John U Feb 01 '16 at 15:08
  • Most likely the issue seems to be buggy hardware. Have you checked with a different camera of the same make. I think you don't have any other solution than filtering at the switch. – Shreesh Feb 01 '16 at 14:54

1 Answers1

1

Well it looks like I found an answer - there was a bug in the ti-davinci EMAC driver:

Davinci-Linux mailing list notes:

Said commit adds a check whether the carrier link is ok. If the link is not ok, the skb is freed and no new dma descriptor added to the rx dma channel. This causes trouble during initialization when the carrier status has not yet been updated. If a lot of packets are received while netif_carrier_ok returns false, all dma descriptors are freed and the rx dma transfer is stopped.

To reproduce the bug, flood ping the davinci board while doing ifconfig eth0 down && ifconfig eth0 up on the board.

After that, the rx path stops working and the overrun value reported by ifconfig is counting up.

We've built & tested this, the device can now happily endure 2500+ packets/sec washing up against it with no adverse effects so pretty certain it's fixed - previously it only took about ~50 packets/sec to upset it.

John U
  • 161
  • 4