Linux Network device drops and restarts in a loop

0

I have a decently unique stack I am working through, so any diagnostic/debugging strategies are greatly welcome.

Setup: Ubuntu 18.04 desktop, PCIe network card with 2 ports. Two Ethernet camera plugged directly into the card. This gives them a 169.254.x.y link-local address and this worked great for a while.

I tried using dnsmasq at one point to act as DHCP on these ports so I can assign a static IP to the camera (because docker macvlan reasons). This also worked, but later proved unnecessary, so I disabled dnsmasq.

Now when the devices are plugged in, the system goes into a loop, enabling and disabling the interface repeatedly. journalctl looks something like this:

Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0123] device (ethn0): carrier: link connected
Nov  4 12:12:36 hostname kernel: [ 5705.109602] ixgbe 0000:65:00.0 ethn0: NIC Link is Up 1 Gbps, Flow Control: None
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0130] device (ethn0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0160] policy: auto-activating connection 'Wired connection 3'
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0200] device (ethn0): Activation: starting connection 'Wired connection 3' (0ae083fb-03b4-3782-a069-7aa48780f65b)
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0207] device (ethn0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0223] device (ethn0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0306] device (ethn0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0313] dhcp4 (ethn0): activation: beginning transaction (timeout in 45 seconds)
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0347] dhcp4 (ethn0): dhclient started with pid 3304
Nov  4 12:12:36 hostname dhclient[3304]: DHCPDISCOVER on ethn0 to 255.255.255.255 port 67 interval 3 (xid=0x16ff5155)
Nov  4 12:12:37 hostname kernel: [ 5706.148368] ixgbe 0000:65:00.0 ethn0: NIC Link is Down
Nov  4 12:12:37 hostname avahi-daemon[492]: Joining mDNS multicast group on interface ethn0.IPv6 with address fe80::7e02:198c:dd14:f845.
Nov  4 12:12:37 hostname avahi-daemon[492]: New relevant interface ethn0.IPv6 for mDNS.
Nov  4 12:12:37 hostname avahi-daemon[492]: Registering new address record for fe80::7e02:198c:dd14:f845 on ethn0.*.
Nov  4 12:12:39 hostname dhclient[3304]: DHCPDISCOVER on ethn0 to 255.255.255.255 port 67 interval 4 (xid=0x16ff5155)
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.0543] device (ethn0): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.0707] dhcp4 (ethn0): canceled DHCP transaction, DHCP client pid 3304
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.0707] dhcp4 (ethn0): state changed unknown -> done
Nov  4 12:12:43 hostname avahi-daemon[492]: Withdrawing address record for fe80::7e02:198c:dd14:f845 on ethn0.
Nov  4 12:12:43 hostname avahi-daemon[492]: Leaving mDNS multicast group on interface ethn0.IPv6 with address fe80::7e02:198c:dd14:f845.
Nov  4 12:12:43 hostname avahi-daemon[492]: Interface ethn0.IPv6 no longer relevant for mDNS.
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.3914] device (ethn0): carrier: link connected
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.3922] device (ethn0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Nov  4 12:12:43 hostname kernel: [ 5712.488688] ixgbe 0000:65:00.0 ethn0: NIC Link is Up 1 Gbps, Flow Control: None
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.3954] policy: auto-activating connection 'Wired connection 3'

and just loops. I have no idea what service is at fault, though it smells like NetworkManager doing something silly. Rebooting, restarting networkmanager, re-enabling dnsmasq, so far have all failed to remedy, and I'm not sure where to look next.

DeusXMachina

Posted 2019-11-04T17:16:22.597

Reputation: 113

Answers

0

So this turned out to be a fun one to stick in the Compendium of Weird Bugs. As it turns out, this issue was caused by an insufficient power supply. When the camera is plugged in, kernel sees the connection and enables the interface. This tells the camera to start talking to the computer, which pulls juuuust enough extra power to dip the voltage, resetting the camera. Camera resets, causing the kernel to think the interface was unplugged. Camera tries to reconnect and the cycle repeats.

Moral of the story: abstractions leak. Rule in all options, however silly, then rule them out one by one.

DeusXMachina

Posted 2019-11-04T17:16:22.597

Reputation: 113