Why is conntrackd not replicating state?

Question

I have a problem with an active/active firewall cluster where the connection tracking state in the firewall does not seem be be being replicated.

It's active/active because I have two routers connected via different ISP's and a network range that is provided through BGP. How the data is routed back is determined by BGP. Therefore the routing is asymmetric. These two firewalls are networked together on the inside network and I have a virtual IP acting as a default route for windows servers.

When both firewall's are running and an inside server tries to connect, the reply comes back via the secondary firewall (the one which has no record of the connection state). Therefore the reply is dropped and not routed to the server that initiated the request.

I thought conntrackd would fix this but I can't seem to get it to work. Perhaps I misunderstand how it works. Can I get conntrackd to replicate iptables state at all? Does it actually work in active/active mode? Is state replicated in real time?

Here are what my conntrackd.conf file contains.

Sync {
  Mode ALARM {
    RefreshTime 15
    CacheTimeout 180
  }

  Multicast {
    IPv4_Address 225.0.0.50
    Group 3780
    IPv4_Interface 10.0.0.100
    Interface eth2
    SndSocketBuffer 1249280
    RcvSocketBuffer 1249280
    Checksum on
  }
}

General {
  Nice -20
  HashSize 32768
  HashLimit 131072
  LogFile on
  Syslog on
  LockFile /var/lock/conntrack.lock
  UNIX {
    Path /var/run/conntrackd.ctl
    Backlog 20
  }
  NetlinkBufferSize 2097152
  NetlinkBufferSizeMaxGrowth 8388608
  Filter From Userspace {
    Protocol Accept {
      TCP
    }

    Address Ignore {
      IPv4_address 127.0.0.1 # loopback
      IPv4_address 10.0.0.100 # dedicated link0
      IPv4_address 10.0.0.101 # dedicated link1
      IPv4_address x.x.x.130 # Internal ip
    }
  }
}

The other conntrackd is the same apart from the IPv4_interface in the multicast section which has 10.0.0.101. And the internal IP in the filter section ends in 131

I have set firewall rules to accept input to 225.0.0.50/32 & output to 225.0.0.50/32.

I've set mode to ALARM here but first tried FTFW. Neither seems to work.

My kernel version is: 3.11.0.

Sorry, my cut and paste isn't working from the Virtual box window. However, let me just say that when I run: sudo conntrackd -i it lists as output an ESTABLISHED tcp connection which is one that I created with ssh going in.

However, on the other router the same command produces no output. Which I think should mean that the state didn't get transferred across onto the other router.

Any ideas?

Update: I ran tcpdump -i eth2 on each machine and I can see UDP packets arriving locally from the other router that were destined for the multicast address 225.0.0.50 port 3780 with a length of 68 bytes.

If I initiate an ssh connection I see immediate activity on tcpdump, and disconnecting does the same. Otherwise regular heartbeats of that message come through. So it's clear that the routers are sending the packets, but is conntrackd ignoring them? Is there some hidden debug I can turn on?

Update2: Ok, after days of googling and looking at source code I have discovered that conntrackd is replicating the state but it ends up in an external cache. To commit the rules you need to run conntrackd -c. Clearly conntrackd is designed to be used in an active/backup mode.

It seems a new option was introduced at some point called CacheWriteThrough. But was then removed. Can conntrack do active/active or not? I can't seem to find an answer to that.

OpenWRT has released a [documentation](https://wiki.openwrt.org/doc/recipes/high-availability) claiming that conntrackd can be used on an active/active situation. My question though, is on the active/standby scenario of firewalls. If failover happens, when master comes back up, does keepalived take over first? Or conntrackd cache is synchronised first then keepalived takes over? Have searched everywhere for that answer. — Jimmy_A, Jun 12 '17 at 14:40
I want to have one of the two firewalls on standby, and I am worried that if the master comes alive again, keepalived will take over before the session tables are synchronised and all connections will be dropped by master. Is this a valid scenario, or am I worried for nothing? — Jimmy_A, Jun 12 '17 at 14:43
@D.A - Broken state is a valid scenario. However in the link given you'll notice the initial state is `backup` and `nopreempt` and `priority 101` options. Once a node is the master, the other coming alive will not force the other one into backup since nopreempt is set and their priorities are identical. — hookenz, Jun 12 '17 at 20:28
Yeah, I know that. I just linked you the one mentioning about active/active due to your UPDATE 2 part of your question. The same applies in firewalls as [netfilter mentions](http://conntrack-tools.netfilter.org/manual.html#sync). In my configuration priorities are different and `nopreemt` is set for a master/backup environment. My main concern is whether the session tables will be synchronised before master takes over otherwise every session will drop. Well maybe testing it will be the best answer. — Jimmy_A, Jun 13 '17 at 08:51
@D.A - definately test it in an isolated setup if you can. It was a mission to get working well. In the end I went to an active/backup configuration and did some BGP hacks to force traffic to return over the primary always unless down. Worked much more reliably. — hookenz, Jun 13 '17 at 20:27

hookenz · Accepted Answer · 2014-09-12T01:17:45.170

5

Ok, after days of frustration and little documentation and even reading source code. I've figured it out.

Mode FTFW {
     [...]
     DisableExternalCache On
}

Disabling the external cache is what you need for an asymmetric routing scenario. Otherwise for active/backup you want to use the default off and set notify_master, notify_backup, notify_fault settings in keepalived.

The setting CacheWriteThrough was removed and replaced with DisableExternalCache.

Those scripts are used to commit the external connection state cache to the router holding the IP. With DisableExternalCache On they shouldn't be needed because the state is already committed.

edited Sep 12 '14 at 01:17

answered Sep 12 '14 at 00:12

hookenz

14,132
22
86
142

I have the exact scenario and I also want to do the asymmetric multipath routing. Well, I disabled the external cache, but now nothing syncs in the second router. I see the UDP packets coming having the new connection information. But they are not at all committed to the kernel. Can you provide some information in this regard? Thanks. – Soumen Feb 08 '16 at 11:10
To be honest, I can't recall exactly. I've handed this thing over to another colleague as I work in another country. I did still have issues with routing in general and in the end switched to active/backup by using BGP path prepend and a fail-over default route with keepalived to ensure that data always flowed in one direction most of the time and then when fail-over occurred I the state was already replicated. It didn't always work perfectly but I found that it was acceptable because most protocols recover or reconnect. – hookenz Feb 21 '17 at 19:18

score 0 · Answer 2 · answered Apr 30 '18 at 09:04

I found an active/backup configuration (without nopreempt) failed on a firewall/router pair if the active server was rebooted. As the master went down, the backup took over and the primary-backup.sh script committed the external cache to the kernel table, as expected. All connections stayed active. However, as the (original) master restarted and took over again, since its external cache was empty, the primary-backup.sh script committed an empty external cache to the kernel table and all connections were dropped by iptables. I fixed this by adding a few lines near the beginning of the script:

case "$1" in
  primary)
    #
    # request resynchronization with master firewall replica
    #
    # Note: this is an attempt to fix problem after reboot of original master,
    # which had no entries in external cache and so resulted in empty
    # conntrack table
    #
    $CONNTRACKD_BIN -C $CONNTRACKD_CONFIG -n
    if [[ $? -eq 1 ]]
    then
        logger "ERROR: failed to invoke conntrackd -n"
    fi

    #
    # commit the external cache into the kernel table
    #
    # etc

Why is conntrackd not replicating state?

2 Answers2

Linked