0
I've been going multiple times through a fail-over setup in a small Proxmox cluster (=Debian with addon packages). As there was no good documentation, I post this question, which is my answer :-)
The idea: A separate Storage and Service network should be established with the ability to fail-over, if one of the switches fails or is in maintenance. In the service network, we want to segregate traffic further with VLANs.
The solution to the problem is:
- use bonding in active-backup mode for each network (bond0, bond1)
- each bond has a primary network interface, over which the traffic should go in regular mode (iface A, iface B)
- in the failover scenario, use the other network; as both storage and service network are connected, the ARP packets will find the desired endpoint
|---------------[ storage switch ]
| x x x x
| | | | |
failover | | | |
link x x x x
| iface A iface A iface A iface A
|
| [ Node 1 ] [ Node 2 ] [ Node 3 ] [ Node X ]
|
| iface B iface B iface B iface B
| x x x x
| | | | |
| | | | |
| x x x x
|
|---------------[ services switch ]
- the fun is now, how to make two bonds in parallel over the same interface ? Solutions:
- go with VLANs on top of the iface A, iface B, and bond the VLANs together
- use traffic shaping (tc)
I've tried both solutions to make them run - I was only successful with the first:
create VLANs for both interfaces
- iface A.100
- iface A.101
- iface B.100
- iface B.101
create Bonds on top of VLANs
bond0
- slave iface A.100
- slave iface B.100
- bond1
- slave iface A.101
- slave iface B.101
Create VLANs on top of Bonds- you have now Q-in-Q
- bond1.5000
- bond1.XXX
My challenges were to understand, where to put the bond-XXX arguments; it has to be in the first interface which is part of the bond (in my case: ifaceA.1000), where all the miimon, up- and downdelay are described. Now check with cat /proc/net/bonding/bond0:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: ifaceA.100 (primary_reselect always)
Currently Active Slave: ifaceA.100
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200
Slave Interface: ifaceA.100
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: XX:XX:XX:XX:XX:XX
Slave queue ID: 0
Slave Interface: ifaceB.101
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: YY:YY:YY:YY:YY:YY
Slave queue ID: 0
here is my /etc/network/interfaces file:
iface lo inet loopback
auto vmbr0
iface vmbr0 inet static
# your usual proxmox mgmt interface
address A.B.C.D
netmask 255.255.255.0
gateway A.B.C.1
bridge_ports eth0
bridge_stp off
bridge_fd 0
# Proxmox Mgmt bridge
auto ifaceA
iface ifaceA inet manual
mtu 9100
#Storage net
auto ifaceB
iface ifaceB inet manual
mtu 9100
#Service net
auto ifaceA.100
iface ifaceA.100 inet manual
bond-master bond0
bond-primary ifaceA.100
bond-miimon 100
bond-updelay 200
bond-downdelay 200
bond-mode active-backup
mtu 9048
#Primary leg of storage bond0
auto ifaceA.101
iface ifaceA.101 inet manual
bond-master bond1
bond-miimon 100
bond-updelay 200
bond-downdelay 200
bond-mode active-backup
mtu 9048
#Secondary leg of services
auto ifaceB.100
iface ifaceB.100 inet manual
bond-miimon 100
bond-updelay 200
bond-downdelay 200
bond-master bond0
bond-mode active-backup
mtu 9048
#Secondary leg of services
auto ifaceB.101
iface ifaceB.101 inet manual
bond-master bond1
bond-primary ifaceB.101
bond-miimon 100
bond-updelay 200
bond-downdelay 200
bond-mode active-backup
mtu 9048
#Primary leg of services
auto bond0
iface bond0 inet static
address W.X.Y.Z
netmask 255.255.255.0
bond-mode active-backup
bond-primary ifaceA.100
mtu 9048
#Storage for Ceph (pveceph init --network W.X.Y.0/24)
auto bond1
iface bond1 inet static
address Q.P.O.R
netmask 255.255.255.0
bond-mode active-backup
bond-primary ifaceB.101
mtu 9048
#Services/Corosync bond (pvecm create MYCLUSTER --bindnet0_addr Q.P.O.R --ring0_addr static-hostname-for-this-node)
auto bond1.5000
iface bond1.5000 inet manual
mtu 9000
# bond1 services on VLAN 5000, has no IP bound to it
auto vmbr5000
iface vmbr5000 inet manual
bridge-ports bond1.5000
bridge-stp off
bridge-fd 0
mtu 9000
# bond1.5000 services, which can be consumed within a VM
# AND ... more of the same
auto bond1.XXX
iface bond1.XXX inet manual
mtu 9000
auto vmbrXXX
iface vmbrXXX inet manual
bridge-ports bond1.XXX
bridge-stp off
bridge-fd 0
mtu 9000