2

I am having issues with load balancing UDP Syslog to my Graylog cluster nodes. At first everything seemed to work normal but it seems that traffic is flowing for 99% to one of the two nodes.

I have two Ubuntu servers (18.04) running Keepalived 1.3.9. They share the virtual IP that is shared via VRRP. They are using NAT to forward the traffic to the real servers based on round robin.

global_defs {
 notification_email {
     redacted@mail
   }
   notification_email_from severname-redacted
   smtp_server mailsever.redacted
   smtp_connect_timeout 30
   router_id servername
}

vrrp_instance VI_1 {
  state MASTER
  interface ens160
  virtual_router_id 216
  priority 200
  advert_int 1
  preempt_delay 30
  virtual_ipaddress {
    10.18.242.216
  }
  notify /usr/local/bin/vrrp_state.sh
}

virtual_server 10.18.242.216 10514 {
  delay_loop 2
  protocol UDP
  lb_algo rr   # round robin
  lb_kind NAT   # NAT

  real_server 10.18.242.214 10514 {
    weight 1
    HTTP_GET {
      url {
        path "/api/system/lbstatus"
        status_code 200
      }
      connect_timeout 3
      connect_port 9000
    }
  }

  real_server 10.18.242.213 10514 {
    weight 1
    HTTP_GET {
      url {
        path "/api/system/lbstatus"
        status_code 200
      }
      connect_timeout 3
      connect_port 9000
    }
  }
}

The secondary load balancer is using the same configuration, except the priority which is 100.

Failover between the load balancers is working as expected, but they both seem to forward the traffic only to the first Graylog node:

oot@redacted-lb1:~# ipvsadm -L -n --rate
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port                 CPS    InPPS   OutPPS    InBPS   OutBPS
  -> RemoteAddress:Port
UDP  10.18.242.216:10514                 0       57        0    16581        0
  -> 10.18.242.213:10514                 0       67        0    19666        0
  -> 10.18.242.214:10514                 0        0        0        0        0

As you can see there is no traffic to the secondary Graylog node, even though the weight is equal and we use round robin. Some troubleshooting that did not work:

  • Removing the first node from the load balancers, you see the traffic still arriving on the LB but it is not forwarded to the Graylog node
    • Changing the weight doesn't seem to have an effect
    • Rebooting all servers
    • Doing all of the same tests on the secondary LB by shutting down LB1.

The Graylog nodes are both working fine and are almost identical in configuration. You can send the syslog to both of them directly so they do not seem to be the problem.

Robert
  • 121
  • 6
  • Do you spot any differences if you change the load balancing method from round-robin to something else, e.g. source hash? – Tommiie Oct 05 '18 at 12:58
  • I've tried two othermethods: WRR and WLC, which did not make a difference – Robert Oct 05 '18 at 13:43

0 Answers0