2

can anyone please shed some light on the differences I list below. And maybe explain why NetworkManager does it differently. And please advise if we can change NetworkManager to behave more like the non-NetworkManager scenario.

Both CentOS 7.8 servers use dhclient but the one is being controlled by NetworkManager. Both have the same switch/NIC down/up event every few days (uncontrollable at this time - for many reasons and also we are remote)

The server#0 that is using NetworkManager immediately attempts to request DHCP after the down/up outages. It fails to get any response from DHCP (another switch issue) and then cancels DHCP transaction and changes the state to timeout. Then it does nothing, unless NetworkManager is restarted (obviously this can only be done at the console). Please see the whole sequence below.

The server#1 that is not using NetworkManager recovers fine through these down/up outages, it seems it just keeps its lease throughout the NIC down and doesn't even renew on NIC up, and just keeps using its IP! Later it is able to renew DHCP at the regular lease timeout interval. Please see the whole sequence below.

Please let me know if I can change NetworkManager to behave more like plain dhclient. Maybe it can be configured to just keep the current lease after a down/up and just renew at the regular lease timeout interval? thanks!!

Server #0:

-- Last regular DHCP renew:
Feb 26 09:31:21 server0 dhclient[4766]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x58eefe09)
Feb 26 09:31:21 server0 dhclient[4766]: DHCPACK from 10.20.20.131 (xid=0x58eefe09)
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5084] dhcp4 (enp96s0f0):   address 10.20.20.223
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   plen 22 (255.255.252.0)
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   gateway 10.20.20.1
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   lease time 18000
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   nameserver '10.20.20.49'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5091] dhcp4 (enp96s0f0):   nameserver '10.20.20.48'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5091] dhcp4 (enp96s0f0):   domain name 'dom.com'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5091] dhcp4 (enp96s0f0): state changed bound -> bound
Feb 26 09:31:21 server0 dhclient[4766]: bound to 10.20.20.223 -- renewal in 8129 seconds.
Feb 26 09:31:21 server0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 09:31:21 server0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: new request (4 scripts)
Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: start running ordered scripts...
-- Random switch outage:
Feb 26 10:49:10 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8263] device (enp96s0f0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8467] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 4766
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8468] dhcp4 (enp96s0f0): state changed bound -> done
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8679] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:49:16 SERVER0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 10:49:16 SERVER0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: new request (4 scripts)
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: start running ordered scripts...
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts)
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts...
Feb 26 10:58:46 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None
-- Machine is not accessible
-- NetworkManager tries to recover and request DHCP:
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6768] device (enp96s0f0): carrier: link connected
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6783] device (enp96s0f0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6823] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6835] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6837] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6844] manager: NetworkManager state is now CONNECTING
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6848] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.7360] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.7369] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.7435] dhcp4 (enp96s0f0): dhclient started with pid 44653
Feb 26 10:58:46 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4)
Feb 26 10:58:54 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4)
Feb 26 10:59:13 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 3 (xid=0x2f70b1a3)
Feb 26 10:59:16 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 6 (xid=0x2f70b1a3)
Feb 26 10:59:22 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 9 (xid=0x2f70b1a3)
Feb 26 10:59:31 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x2f70b1a3)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn>  [1614355171.8451] dhcp4 (enp96s0f0): request timed out
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8451] dhcp4 (enp96s0f0): state changed unknown -> timeout
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8540] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44653
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8541] dhcp4 (enp96s0f0): state changed timeout -> done
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8545] device (enp96s0f0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8553] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn>  [1614355171.8559] device (enp96s0f0): Activation: failed for connection 'enp96s0f0'
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8563] device (enp96s0f0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8606] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8615] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8617] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
-- NetworkManager tries to recover and request DHCP again following a different process:
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8624] manager: NetworkManager state is now CONNECTING
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8628] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.9420] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.9429] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.9489] dhcp4 (enp96s0f0): dhclient started with pid 44712
Feb 26 10:59:32 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866)
Feb 26 10:59:36 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866)
Feb 26 10:59:44 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4)
Feb 26 10:59:49 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4)
Feb 26 10:59:54 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 7 (xid=0x3ffbeab4)
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5823] device (enp96s0f0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5846] device (enp96s0f0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5850] device (enp96s0f0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5869] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5982] manager: NetworkManager state is now CONNECTED_SITE
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5988] policy: set 'enp96s0f0' (enp96s0f0) as default for IPv6 routing and DNS
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5992] device (enp96s0f0): Activation: successful, device activated.
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.6003] manager: NetworkManager state is now CONNECTED_GLOBAL
Feb 26 10:59:59 SERVER0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 10:59:59 SERVER0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: new request (4 scripts)
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: start running ordered scripts...
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts)
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts...
Feb 26 11:00:01 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x3ffbeab4)
Feb 26 11:00:15 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 21 (xid=0x3ffbeab4)
-- NetworkManager cancels and times out and does nothing anymore
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <warn>  [1614355216.8456] dhcp4 (enp96s0f0): request timed out
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info>  [1614355216.8463] dhcp4 (enp96s0f0): state changed unknown -> timeout
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info>  [1614355216.8649] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44712
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info>  [1614355216.8650] dhcp4 (enp96s0f0): state changed timeout -> done

Server #1:

-- Last regular DHCP renew:
Feb 26 10:34:00 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34)
Feb 26 10:34:00 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34)
Feb 26 10:34:02 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8195 seconds.
-- Random switch outage:
Feb 26 10:49:10 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down
Feb 26 10:58:46 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None
-- Machine is accessible during this time!
-- Next regular DHCP renew:
Feb 26 12:50:37 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34)
Feb 26 12:50:37 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34)
Feb 26 12:50:39 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8611 seconds.
jodlgc
  • 25
  • 4

1 Answers1

1

In NetworkManager, a device has an overall logical state. That is what you see in nmcli device.

If the device is connected (activated), then it may fail to get an address from DHCP (or, a DHCP timeout may happen later). Depending on ipv4.dhcp-timeout (which you can set to infinity), after a while DHCP will be considered as failed. When that happens, the device may go down altogether. That depends on the setting ipv4.may-fail. If ipv4.may-fail=no, then a DHCP failure is fatal to the activation, and the device goes down. If not, as long as you have an IPv6 address, the overall state is still considered good. In that case, DHCP should be retried indefinitely, while the device stays activaed/up.

On the other hand, if the device goes down due to a failure, it becomes eligible to autoconnect again (at least, if you set it connection.autoconect=yes). This autoconnect cycle repeats for up to connection.autoconnect-retries times, and then autoconnect is blocked for 5 minutes before it starts again.

This is how it is supposed to be. But with CentOS7.8, I am not sure that all of this works as I said. You say, "Then it does nothing, unless NetworkManager is restarted". Are you sure about that? Did you wait long enough? After DHCP fails, it might back off for a bit. The log you pasted finishes after that.

When debugging NetworkManager the debug logs are more useful. Configure level=TRACE logging in NetworkManager.conf.

Maybe ipv4.may-fail=no would help? Then at least the device would go down, and an autoconnect cycle would start again.


Btw, if you want that NetworkManager leaves the device up when unplugging the cable (as you seem to like about dhclient), then configure "ignore-carrier" in man NetworkManager.conf.

thaller
  • 159
  • 1
  • thanks much! that gives me so much more knowledge about NetworkManager and a few things to try. yes I am pretty sure that it does nothing as it has been left this way for days. Nothing new for NetworkManager is shown again in /var/log/messages. I'll check some of those nmcli settings you mentioned and double back here with any new info I find. thank you again! – jodlgc Mar 04 '21 at 21:02
  • so our settings show
    `ipv4.may-fail=yes` `connection.autoconnect=yes` `connection.autoconnect-retries=-1` i also suspect as you mention the device is getting a IPv6 address based on the line in the log I posted
    – jodlgc Mar 05 '21 at 15:17
  • please see our settings in my last comment. based on your instruction I think I will try `ipv4.may-fail=no`. As you suggest this way the `connection.autoconnect` would try again. We have previously tried `ipv4.dhcp-timeout=infinity` but it just repeatedly pounds DHCPDISCOVER every several seconds (different intervals) without reseting anything in between. ie: `dhclient[18372]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 7` `dhclient[18372]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14` `etc` `etc` – jodlgc Mar 05 '21 at 15:33
  • 1
    I am having the same problem: ip6 renewed, ip4 not renewed and PC disconnected. However, where should I modify the parameters? I mean, what file contains "ipv4.may-fail", "connection.autoconnect-retries", etc? – Antonio Sesto Mar 15 '22 at 10:22