4

I'm having issues with racoon (ipsec VPN) on pFSense 2.0.2 (and 2.0.1). According to racoon all my tunnels are up (I have about 130 of them), but over time more and more of them won't pass traffic through. If I restart racoon the tunnels start working again, for a period of time.

There is almost no CPU utilization, only about 20% RAM is being used (before or after racoon restart).

On all locations I'm doing DPD, according to PF the tunnels are up.

Just now Nagios was showing I had 54 down locations, restarted racoon, and everything comes back up.

-- Edit -- Also I should note, we currently have pF 1.2.3 running these with absolutely no issues, but I do have the same issue between the two PF boxes (1.2.3 <-> 2.0.2), likely moving to ovpn for this.

-- Edit --

Also noticed today that it only ever drops up to 50-60 of the tunnels over a few hours, and no more.

-- Edit -- From the logs I did find this when pinging a dead location: ERROR: can't start the quick mode, there is no ISAKMP-SA

-- Edit -- What I'm finding is if I login to a device on the remote network and ping the pF network a new Phase2 is created and the tunnel works again. It should be opening the tunnel when I ping the other direction but it's simply not.

-- Edit --

In my case, the modems we are connecting to have a setting for "keep tunnel alive" (not DPD), which seems to work-around this issue pF is having. It seems like pF will not negotiate a phase2 when it's requested, which is extremely curious. I've got Nagios checks happening every couple of minutes trying to go across the tunnel, which should cause pF to do a new P2 (or P1+P2 if required) once the lifetime is expired, but it's just not. According to pF's IPsec status page the tunnel is still alive (probably because the P1 is still valid) when it's quite obviously not.

cpuguy83
  • 202
  • 2
  • 7

4 Answers4

3

For anyone looking for resolution try this reference:

"To resolve this issue disable NAT-T (when pfsense holds the public IP). If that still does not help disable DPD and set 'Negotiation Mode' in Phase 1 to main"

slm
  • 7,355
  • 16
  • 54
  • 72
Matt
  • 53
  • 6
1

It is probably the security associations going out of sync due little hiccups in the internet connection. Next time it happens, have a look at "status > ipsec > sad". Start pinging the other host, if it is timing out and there are more than 2 (one for each side) then try deleting the dead "data" SAD's, and see if your pings start again. This was very common for me with ipsec.

Also have a look at the pfsense ipsec logs. IPSec logs everything, and if you wish update it here and we can try and help. You can also turn on racoon debugging.

As a long term solution, I use openvpn nowadays, which is built into pfsense, I would recommend setting this up, along side your ipsec and move over to it.

Sc0rian
  • 1,011
  • 7
  • 16
  • Unfortunately we are connecting to edge devices (RavenXE cell modems) that only support ipsec or plain GRE tunnels (which I haven't gotten to work). The logs don't show any issues with these locations. Normally if a peer is dead, the next time we try to connect it shows up in the log, creating new SA's and all. With this situation nothing is happening. – cpuguy83 Jan 14 '13 at 16:51
0

Maybe preferring old SAs where it shouldn't be? System>Advanced, Misc, "Prefer older IPsec SAs", uncheck that if it's checked.

Chris Buechler
  • 2,938
  • 14
  • 18
0

The same problem in our network...

I have pfSense V2.0.1 that connects the head office (A) through IPsec VPN with 7 branches (B1, B2, B3, B4, B5, B6, B7). In all branches I have Cisco WRVS4400N router with the latest firmware (V2.0.2.1).

I have static WAN IPs everywhere and all Cisco routers have identical configuration... the only differences are the WAN/LAN/WiFi IPs and the WiFi password.

I am using two ISP:

  • BELL --> B1, B2, B3, B4, B5 and B6

  • VIDEOTRON --> A and B7

All Internet connections are through a modem configured in bridge mode and the model for BELL is the same in all six branches.

Here is how the IPsec works:

The VPN between A and B1-B5 is stable from both ends. No problems at all.

The VPN between A and B6 is stable only from the pfSense side. The tunnel shows up all the time from both sides and if I do ping from the A network to a PC on the B6 network (LAN IP) I have access. Unfortunately, the connection from B6 to A does not work after less than a minute when there is on activity at both sides (both sides still show the tunnel as up) and it stays down until I do ping from A to B6... At that point I have access from both sides again... We decided to swap two of the Cisco routers (B5 with B6) and we found that the problem stays with the branch!?! We requested BELL to investigate the problem but we were told that everything is OK with their equipment. BELL accepted to replace the modem but unfortunately that did not change anything... The only solution for us at the moment is a constant ping from the A network to the B6 network.

The VPN between A and B7 (the same ISP - VIDEORTON) is stable only from the branch (B7) side. The tunnel shows up all the time on the B7 router and B7 has no problems connecting to the A network. On the pfSense I see the tunnel going down every 1 hour (this is because of the Phase 2 lifetime) and then it cannot be restored. At that point the tunnel can be restored only from the B7 side (ping to a PC on the A network). For now we decided to run a constant ping from the B7 network to the A network.

NOTE: Few days ago I did upgrade pfSense to V2.0.2 but that did not change anything.

I believe that the problem is with equipment on the Internet providers but based on my experience with the first and second level support it is not possible to prove it.

Regards and good luck.