I'm having issues with racoon (ipsec VPN) on pFSense 2.0.2 (and 2.0.1). According to racoon all my tunnels are up (I have about 130 of them), but over time more and more of them won't pass traffic through. If I restart racoon the tunnels start working again, for a period of time.
There is almost no CPU utilization, only about 20% RAM is being used (before or after racoon restart).
On all locations I'm doing DPD, according to PF the tunnels are up.
Just now Nagios was showing I had 54 down locations, restarted racoon, and everything comes back up.
-- Edit -- Also I should note, we currently have pF 1.2.3 running these with absolutely no issues, but I do have the same issue between the two PF boxes (1.2.3 <-> 2.0.2), likely moving to ovpn for this.
-- Edit --
Also noticed today that it only ever drops up to 50-60 of the tunnels over a few hours, and no more.
-- Edit -- From the logs I did find this when pinging a dead location: ERROR: can't start the quick mode, there is no ISAKMP-SA
-- Edit -- What I'm finding is if I login to a device on the remote network and ping the pF network a new Phase2 is created and the tunnel works again. It should be opening the tunnel when I ping the other direction but it's simply not.
-- Edit --
In my case, the modems we are connecting to have a setting for "keep tunnel alive" (not DPD), which seems to work-around this issue pF is having. It seems like pF will not negotiate a phase2 when it's requested, which is extremely curious. I've got Nagios checks happening every couple of minutes trying to go across the tunnel, which should cause pF to do a new P2 (or P1+P2 if required) once the lifetime is expired, but it's just not. According to pF's IPsec status page the tunnel is still alive (probably because the P1 is still valid) when it's quite obviously not.