1

We have set up 3 servers running keepalived . We started noticing some random re-elections occurring which we can't explain so I cam here looking for advice.

Here is our configuration:

MASTER:

global_defs {
  notification_email {
    webops@example.com
  }
  notification_email_from keepalived@hostname
  smtp_server example.com:587
  smtp_connect_timeout 30
  router_id some_rate
}


vrrp_script chk_nginx {
  script "killall -0 nginx"
  interval 2
  weight 2
}

vrrp_instance VIP_61 {
  interface bond0
  virtual_router_id 61
  state MASTER
  priority 100
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass PASSWORD
  }
  virtual_ipaddress {
    X.X.X.X
    X.X.X.X
    X.X.X.X
  }
  track_script {
    chk_nginx
  }
}

BACKUP1:

global_defs {
  notification_email {
    webops@example.com
  }
  notification_email_from keepalived@hostname
  smtp_server example.com:587
  smtp_connect_timeout 30
  router_id some_rate
}


vrrp_script chk_nginx {
  script "killall -0 nginx"
  interval 2
  weight 2
}

vrrp_instance VIP_61 {
  interface bond0
  virtual_router_id 61
  state MASTER
  priority 99
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass PASSWORD
  }
  virtual_ipaddress {
    X.X.X.X
    X.X.X.X
    X.X.X.X
  }
  track_script {
    chk_nginx
  }
}

BACKUP2:

    global_defs {
      notification_email {
        webops@example.com
      }
      notification_email_from keepalived@hostname
      smtp_server example.com:587
      smtp_connect_timeout 30
      router_id some_rate
    }


vrrp_script chk_nginx {
  script "killall -0 nginx"
  interval 2
  weight 2
}

vrrp_instance VIP_61 {
  interface bond0
  virtual_router_id 61
  state MASTER
  priority 98
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass PASSWORD
  }
  virtual_ipaddress {
    X.X.X.X
    X.X.X.X
    X.X.X.X
  }
  track_script {
    chk_nginx
  }
}

Every now and then I can see this happening (grepped in logs):

MASTER:

Jan  6 18:30:15 lb-public01 Keepalived_vrrp[24380]: VRRP_Instance(VIP_61) Received lower prio advert, forcing new election
Jan  6 18:30:16 lb-public01 Keepalived_vrrp[24380]: VRRP_Instance(VIP_61) Received lower prio advert, forcing new election
Jan  6 18:32:37 lb-public01 Keepalived_vrrp[24380]: VRRP_Instance(VIP_61) Received lower prio advert, forcing new election

BACKUP1:

Jan  6 18:30:16 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Transition to MASTER STATE
Jan  6 18:30:16 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Received higher prio advert
Jan  6 18:30:16 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Entering BACKUP STATE
Jan  6 18:32:37 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) forcing a new MASTER election
Jan  6 18:32:38 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Transition to MASTER STATE
Jan  6 18:32:38 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Received higher prio advert
Jan  6 18:32:38 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Entering BACKUP STATE

BACKUP2:

Jan  6 18:32:36 lb-public03 Keepalived_vrrp[14255]: VRRP_Script(chk_nginx) succeeded
Jan  6 18:32:37 lb-public03 Keepalived_vrrp[14255]: VRRP_Instance(VIP_61) Transition to MASTER STATE
Jan  6 18:32:37 lb-public03 Keepalived_vrrp[14255]: VRRP_Instance(VIP_61) Received higher prio advert
Jan  6 18:32:37 lb-public03 Keepalived_vrrp[14255]: VRRP_Instance(VIP_61) Entering BACKUP STATE

So MASTER receives LOWER PRIO advert and NEW election is started. WHY ? Looks like BACKUP transitions into MASTER for a short period of time (based on the logs) and then fails back to the BACKUP state. I'm quite clueless as why is this actually happening so any hints would be more than welcome.

Also, I found out there is a unicast patch in keepalived, however it's not clear to me whether it supports more than 1 unicast peer - in our case we have a cluster of 3 machines so we need more than 1 unicast peers.

Any hints on these issues would be superamazingly appreciated!

milosgajdos
  • 1,808
  • 2
  • 21
  • 29

1 Answers1

5

The problem is that you use the default state MASTER for the backup nodes. They should state BACKUP.

  vrrp_instance VIP_61 {
      interface bond0
      virtual_router_id 61
      state BACKUP
      priority 98
      ...

Hope this solves your mistery.

pj3s
  • 51
  • 1
  • 2