1

We have redundant 10G fiber links to our upstream internet provider in an active/failover configuration. Between our router and the upstream router, we have a pair of firewalls running Vyos in transparent mode. We use BGP for route advertisement and we can't change most of the parameters.

Drawing: drawing of network layout

If the link to one of the routers goes down (for example, the one with the x in the drawing - this happens to be our most common type of failure), the entire network is inaccessible until the BGP timeout (up to 150 seconds). I already know that if we force the link on the other side of the bridge to down, our routers will immediately start forwarding traffic through the other link.

Is there some way of automatically bringing down one side of the bridge on the firewall if the other side goes down?

Are there any hidden pitfalls to that solution?

yakatz
  • 1,213
  • 3
  • 12
  • 33

2 Answers2

1

I would run a script on the VyOS machines that checks the state of the upstream connection and then does ifdown/ifup on the other side as needed.

The simplest way to do this is putting scripts in /etc/network/if-down.d (check if the interface that went down is the updated side and bring down the other side of it is) and /etc/network/if-up.d (check if the interface that went up is the updated side and bring up the other side of it is).
Alternatively, you could run a monitoring script once a minute with cron (or Systemd Timers, or any similar scheduler), or you could write it as an infinite loop that sleeps for a few seconds to get sub-minute checking.

Moshe Katz
  • 3,053
  • 3
  • 26
  • 41
  • VyOS uses `netplugd` to monitor the interfaces, but it is similar to `if-down.d`. For some reason my script causes `netplugd` to crash though - I will post my answer and a separate followup question. – yakatz Mar 18 '18 at 19:16
1

I wrote a script that checks interfaces against /sys to determine if they are bridge members and then bounce the bridge. VyOS uses netplugd to monitor interfaces and for some reason my script confuses it (I will probably write a separate question for that), but I think it is a good general solution.

#!/bin/bash

## This script will bounce a br interface if a member interface goes down.
## This will cause router BGP timers to reset, making outages last only seconds instead of minutes.
##
## This script is called by netplug on Vyos:
## /etc/netplug/linkdown.d/my-brdown
##
## Version History
## 1.0 - Initial version
##

LOCKDIR=/var/run/my-bridge-ctl

# Since we only have one br, not going to implement this right now.
#IGNORE_BRIDGES=()

IFACE=$1

#Remove the lock directory
function cleanup {
    if rmdir $LOCKDIR; then
        logger -is -t "my-bridge-ctl" -p "kern.info" "Finished"
    else
        logger -is -t "my-bridge-ctl" -p "kern.error" "Failed to remove lock directory '$LOCKDIR'"
        exit 1
    fi
}

if mkdir $LOCKDIR; then
    #Ensure that if we "grabbed a lock", we release it
    #Works for SIGTERM and SIGINT(Ctrl-C)
    trap "cleanup" EXIT

    logger -is -t "my-bridge-ctl" -p "kern.info" "Acquired lock, running"

    # Processing starts here

    IFACE_DESC=$(<"/sys/class/net/${IFACE}/ifalias")
    IFACE_BR_DIR="/sys/class/net/${IFACE}/brport"

    if [ ! -d "$IFACE_BR_DIR" ]; then
        logger -is -t "my-bridge-ctl" -p "kern.warning" "Interface ${IFACE} (${IFACE_DESC-no desc}) went down. Not a member of a bridge. Skipping."
    else
        IFACE_BR_LINK=$(realpath "/sys/class/net/${IFACE}/master")
        IFACE_BR_NAME=$(basename $IFACE_BR_LINK)
        IFACE_BR_DESC=$(<"${IFACE_BR_LINK}/ifalias")
        logger -is -t "my-bridge-ctl" -p "kern.warning" "Interface ${IFACE} (${IFACE_DESC:-no desc}) went down. Member of bridge ${IFACE_BR_NAME} (${IFACE_BR_DESC:-no desc})."

        # TODO: Insert IGNORE_BRIDGE check here

        find "${IFACE_BR_LINK}/brif" -type l -print0 | while IFS= read -r -d $'\0' IFACE_BR_MEMBER_LINK; do
            IFACE_BR_MEMBER_NAME=$(basename $IFACE_BR_MEMBER_LINK)
            logger -is -t "my-bridge-ctl" -p "kern.info" "Handling ${IFACE_BR_NAME} member interface ${IFACE_BR_MEMBER_NAME} (${IFACE_BR_MEMBER_LINK})."

            # Actually do the bounce
            ip link set dev ${IFACE_BR_MEMBER_NAME} down && sleep 2 && ip link set dev ${IFACE_BR_MEMBER_NAME} up

            logger -is -t "my-bridge-ctl" -p "kern.info" "Interface ${IFACE_BR_MEMBER_NAME} bounced."
        done
    fi

    sleep 5
else
    logger -is -t "my-bridge-ctl" -p "kern.info" "Could not create lock directory '$LOCKDIR'"
    exit 1
fi
yakatz
  • 1,213
  • 3
  • 12
  • 33