2

I have two dedicated servers hosted by OVH, an SP-128 and an MG-512. The hardware is similar (they are both Supermicro-based builds) but the SP-128 has a Xeon E5 Ivy Bridge and 1 Gbps Ethernet; the MG-512 has dual Xeon E5 Haswell and 10 Gbps Ethernet.

I'm in the process of migrating everything from the old server (the SP-128) to the new server (the MG-512). Here's what I've done so far:

  • Set up the partitions on the new box
  • Copied the data and OS (the entire ZFS storage pool, all-inclusive) from the old to the new box using zfs send | ... ssh "zfs recv"
  • Reinstalled GRUB
  • The new server has a new public IP, so I changed that in /etc/network/interfaces, and it has a new "name" for its primary Ethernet adapter, so I updated the config (old: eno1; new: enp3s0f0)
  • I also updated the libvirt network adapter, macvtap0 (labeled macvtap-net in libvirt) replacing eno1 with enp3s0f0 in the configuration, and have done a virsh destroy macvtap-net; virsh start macvtap-net to ensure the change took effect.
  • The new server boots and is accessible over the network!

Now, I have a bunch of LXD and libvirt/KVM guests. Each of them has one or more static IPs that they claim from one of two public /27 subnets I have. OVH has migrated both /27s over to my new server (at my request). Here are some more details of the networking setup:

  • The LXD containers are connected using macvlan. The networking works. All of my LXD containers can reach (and be reached by) the public Internet.
  • The KVM guests are connected using macvtap. The networking doesn't work.
  • The default gateway and the subnet used by the LXD and the KVM guests is exactly the same; the only thing that differs is the MAC address used by each guest/container, and the specific IP within the /27. I haven't changed the guest's IP configuration because it shouldn't need to be changed.

My problem is the above: The networking for the KVM guests doesn't work. By "doesn't work", I mean the guests are able to set the static IP as before, but they are completely unreachable. The guests can't ping their default gateway or any other IP, public or private.

I have the old and new server running simultaneously, so I've fairly thoroughly checked that the configuration is the same between them. For instance, the old server does not have promiscuous mode enabled for the primary ethernet adapter, nor for macvtap0. Neither does the new server.

Most other configuration stuff couldn't have changed, because the files that make up the configuration of the OS are bit for bit copied from the old server to the new. The libvirt/KVM configuration, for example, was not modified or recreated between the servers; it is literally the same configuration copied over as part of the filesystem-level data transfer with zfs send / zfs recv.

So, the way I see it, the variables at play are:

  • The "main" public IPv4 address of the physical box itself changed from the old to the new server.
  • The model of the Ethernet adapter changed, from an Intel Gigabit Ethernet to 10G.
  • The name of the Ethernet adapter changed, from eno1 to enp3s0f0.
  • The default gateway for the physical box changed, but the default gateway for the /27s is still routable and usable (it works in LXD with macvlan).

I've triple-checked the MAC addresses of the KVM virtio-net adapters are correct, because OVH performs MAC address filtering (mainly to prevent accidental misconfiguration, not as a security measure) and has assigned a specific MAC address to each IP I'm using.

So I'm at a loss. Why can't my KVM guests (I have one Linux and one Windows) access the network on the new box, when basically everything is the same, and the only things that were legitimate changes have already been changed?

Oh, and I also changed the network adapter in each of the libvirt guest's config (and rebooted the guests). Here's what that looks like now:

<interface type='direct'>
      <mac address='re:da:ct:ed'/>
      <source dev='enp3s0f0' mode='bridge'/>
      <target dev='macvtap0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

Notice enp3s0f0 is in there too (used to be eno1).


Here is some additional troubleshooting stuff:

ip -d link macvtap0 show on the new box:

18: macvtap0@enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
    link/ether re:da:ct:ed brd ff:ff:ff:ff:ff:ff promiscuity 0
    macvtap mode bridge addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

On the old box, the output is 100% identical except for replace enp3s0f0 with eno1.

allquixotic
  • 487
  • 1
  • 10
  • 24
  • To avoid this sort of problem I have all my IPv4 addresses routed to a vRack. Now if only OVH would do proper IPv6 to a vRack... – Michael Hampton Dec 18 '18 at 00:40
  • I don't need IPv6, but I don't think my server can get a vRack, or at least I was unable to find a way to request one. It's an MG-512. – allquixotic Dec 18 '18 at 06:11

1 Answers1

1

I solved this issue by following this troubleshooting process:

  • I tried the MAC address and the macvtap adapter from the Windows VM that wasn't working, with one of my newer Ubuntu 18.04 virtual machines. It worked. What the heck? So now some KVM guests work with macvtap, but others don't?
  • I did a line-by-line comparison between my (working) Ubuntu VM and the Windows 2k16 VM's XML in libvirt (virsh edit xxx) to figure out what could possibly be the difference.
  • The main difference I spotted was this:

Ubuntu 18.04 (works with macvtap):

<type arch='x86_64' machine='pc-i440fx-bionic'>hvm</type>

<type arch='x86_64' machine='pc-i440fx-yakkety'>hvm</type>

From my knowledge of Ubuntu release names, I recalled that Yakkety is older than Bionic. So, out of ideas, I arbitrarily decided to "upgrade" the machine type of the Windows VM to bionic, and switched the Ubuntu VM back to NAT (it doesn't need a public IP).

It worked!

So, the lesson learned here is awfully clear. Upgrade your KVM machine types frequently, especially with a new release of the OS or even a hardware upgrade. In this case, the yakkety machine type worked with Ivy Bridge hardware and a 1 Gbps Ethernet adapter, but the bionic machine type was required to work with a Haswell physical CPU and an ixgbe (10 Gbps) physical NIC.

BTW, I had previously tried emulating an e1000 adapter in the guest, and that didn't fix the problem, so it wasn't virtio per se. It was somewhere deep in the bowels of the translation layer between the physical hardware and the guest.

allquixotic
  • 487
  • 1
  • 10
  • 24