2

I have heartbeat set up on two servers like so:

Master: 10.15.1.50

Backup: 10.15.1.51

(Virtual IP: 10.15.1.52)

So, the master always has 10.15.1.52 as well, but if it goes down, the backup will take over 10.15.1.52. This works perfectly and fails over in under 10 seconds. We have the a domain name linked to 10.15.1.52 so it is transparent when the servers fail over. We have noticed that although the IP switches over in under 10 seconds, it can take 10-20 minutes before the the server is actually accessible through the domain name.

We do have a router forwarding port 80 since 10.15.1.52 is a private IP. This doesn't make any sense since we're not actually changing anything in the domain name registry. The backup server should be accessible through the domain name as soon as the IP fails over.

Could the problem be NAT on the router? It almost seems like some sort of host verification issue.

Edit: Now that I think about it, this could be a problem with the arp table on the router

Ethan Hayon
  • 235
  • 1
  • 6

1 Answers1

1

I'm nearly absolutly sure that arp isn't your problem.

Imho the problem is with the dnat connection tracking.

Take a look at /proc/net/ip_conntrack or /proc/net/nf_conntrack on your router after takeover from one system to another. You should see that the DNAT conntrack entry is pointing to the failed system.

So if this is really the case, you should search for a solution to clear the specific conntrack table entry on your router.

teissler
  • 738
  • 7
  • 11