Prevent a program (vmet-natd) from getting events that the network has changed state in Linux

2

0

Our Wi-Fi access point is configured with a very aggressive DHCP lease time of 10 minutes. It is not a problem in itself, as the IP address stays the same when the lease gets renewed. But I run VMware Workstation and such a short interval leads to frequent network dropouts inside the virtual machines. The root problem is in the vmet-natd daemon. It detects that there was SOME network event and assumes it was a reconnect. The consequence is that the virtual network adapter in the VM gets a "physical" network disconnect and then immediate reconnect. And all of my TCP sessions are dropped in the VM.

Currently I am running VMware Workstation 15.1.0 on a Xubuntu 18.04 host.

These are the events from syslog when this occurs.

Jun 25 15:23:18 laptop wpa_supplicant[1039]: wlp2s0: WPA: Group rekeying completed with 6c:3b:6b:XX:XX:XX [GTK=CCMP]
Jun 25 15:26:06 laptop dhclient[6554]: DHCPREQUEST of 192.168.XXX.XXX on wlp2s0 to 192.168.XXX.XXX port 67 (xid=0x6f72XXXX)
Jun 25 15:26:06 laptop dhclient[6554]: DHCPACK of 192.168.XX.XX from 192.168.XX.XX
Jun 25 15:26:06 laptop NetworkManager[1038]: <info>  [1561465566.1687] dhcp4 (wlp2s0):   address 192.168.XX.XX
Jun 25 15:26:06 laptop NetworkManager[1038]: <info>  [1561465566.1687] dhcp4 (wlp2s0):   plen 24 (255.255.255.0)
Jun 25 15:26:06 laptop NetworkManager[1038]: <info>  [1561465566.1687] dhcp4 (wlp2s0):   gateway 192.168.XX.XX
Jun 25 15:26:06 laptop vmnet-natd: RTM_NEWADDR: index:4, addr:192.168.XXX.XXX
Jun 25 15:26:06 laptop NetworkManager[1038]: <info>  [1561465566.1688] dhcp4 (wlp2s0):   lease time 600
Jun 25 15:26:06 laptop NetworkManager[1038]: <info>  [1561465566.1688] dhcp4 (wlp2s0):   nameserver '192.168.XXX.XXX'
Jun 25 15:26:06 laptop NetworkManager[1038]: <info>  [1561465566.1688] dhcp4 (wlp2s0):   nameserver 'XXX.XXX.XXX.XXX'
Jun 25 15:26:06 laptop NetworkManager[1038]: <info>  [1561465566.1688] dhcp4 (wlp2s0):   nameserver 'XXX.XXX.XXX.XXX'
Jun 25 15:26:06 laptop NetworkManager[1038]: <info>  [1561465566.1688] dhcp4 (wlp2s0): state changed bound -> bound
Jun 25 15:26:06 laptop dbus-daemon[1020]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.11' (uid=0 pid=1038 comm="/usr/sbin/NetworkManager --no-daemon " label="unconfined")
Jun 25 15:26:06 laptop systemd[1]: Starting Network Manager Script Dispatcher Service...
Jun 25 15:26:06 laptop dhclient[6554]: bound to 192.168.XXX.XXX -- renewal in 267 seconds.
Jun 25 15:26:06 laptop dbus-daemon[1020]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jun 25 15:26:06 laptop systemd[1]: Started Network Manager Script Dispatcher Service.
Jun 25 15:26:06 laptop nm-dispatcher: req:1 'dhcp4-change' [wlp2s0]: new request (1 scripts)
Jun 25 15:26:06 laptop nm-dispatcher: req:1 'dhcp4-change' [wlp2s0]: start running ordered scripts...
Jun 25 15:26:06 laptop kernel: [10747.491441] userif-2: sent link down event.
Jun 25 15:26:06 laptop kernel: [10747.491445] userif-2: sent link up event.

There is a thread on vmware forums about this with no solution.

How do I prevent this? My google-fu was not good enough to find the solution.

There could be several ways to fix this.

  1. Fix vmnet-natd to do special handling for such events. (VMware support is not helpful).
  2. Configure vmnet-natd to completely ignore network events, but there seems to be no such option.
  3. Do not generate network change events if nothing changed on Wi-Fi association rekeying or DHCP lease extension by patching/configuring kernel/userspace linux network stack.
  4. Patch/configure kernel/userspace linux network stack to not send network events (or some subset of them) to vmnet-natd.

Can someone point me towards the path of least resistance to solve this annoyance?

Update 1: The network adapter in the VM is configured in NAT mode, I cannot run it in any other mode as I cannot expose any of my VMs directly onto the office network. DHCP server for host is the access point itself and stays the same all the time. The network is managed with Network Manager.

Denis Nikolaenko

Posted 2019-06-25T13:00:27.303

Reputation: 429

You have indicated you are obliged to run the VMs in NAT mode. Can you assign them static IP addresses? Which OS are the VMs? – harrymc – 2019-06-27T19:43:22.213

Can you swap your VM's to fixed IP's? Maybe outside the DHCP pool range? – Brian – 2019-06-27T19:44:50.810

I think you will find the answer and a patch in the article Fixing VMWare Player on Linux when using DHCP addresses. Unfortunately, this requires a rebuild of VMWare's kernel modules from source. Setting static IP might be a simpler solution.

– harrymc – 2019-06-27T19:54:11.403

Answers

1

The article Fixing VMWare Player on Linux when using DHCP addresses describes the problem and offers a solution.

This problem, introduced with VMwarePlayer v8+, is described as:

Every time the DHCP address of any of the network adapters of the host machine is renewed, all virtual machines receive a network disconnect-and-connect, rendering the network unusable for roughly 20 seconds with each renewal.

This is particularly destructive for networks having a short DHCP lease time such as yours, where roughly every 5 minutes all VMs would lose their network connectivity for a short while.

This behavior can clearly be seen in your /var/log/messages:

Jun 25 15:26:06 laptop kernel: [10747.491441] userif-2: sent link down event.
Jun 25 15:26:06 laptop kernel: [10747.491445] userif-2: sent link up event.

The author of the article found the string userif-3 in the file userif.c, included in the code-tar of the file /usr/lib/vmware/modules/source/vmnet-only.tar which is included with every VMWarePlayer installation.

The code he found looked like this:

965 int
966 VNetUserIfSetUplinkState(VNetPort *port, uint8 linkUp)
967 {
...
1010    LOG(0, (KERN_NOTICE "userif-%d: sent link %s event.\n",
1011         userIf->port.id, linkUp ? "up" : "down"));
1012 
1013    return retval;
1014 }

He then created a patch-file and applied the code as follows:

cd /tmp
tar xf /usr/lib/vmware/modules/source/vmnet.tar
patch -p0 < vmware-vmnet-only.patch
tar cf vmnet.tar vmnet-only
cp /tmp/vmnet.tar /usr/lib/vmware/modules/source/vmnet.tar
/usr/bin/vmware-modconfig --console --install-all
systemctl restart vmware    ## or 'service vmware restart'

I list his patch below:

-- vmnet-only/userif.c  2017-12-21 17:02:28.555820933 +0100
+++ vmnet-only.jjk/userif.c 2017-12-15 13:22:13.257724953 +0100
@@ -973,6 +973,9 @@
    userIf = (VNetUserIF *)port->jack.private;
    hubJack = port->jack.peer;

+   /* never send link down events */
+   if (!linkUp) return 0;
+
    if (port->jack.state == FALSE || hubJack == NULL) {
       return -EINVAL;
    }

harrymc

Posted 2019-06-25T13:00:27.303

Reputation: 306 093

Seems to be working. Thanks! – Denis Nikolaenko – 2019-06-28T14:12:20.603

Did you use this patch? – harrymc – 2019-06-28T17:13:06.783

Yes. The patch applies cleanly. – Denis Nikolaenko – 2019-06-28T17:35:42.167