3

I am trying to setup an OpenSwan(2.6.32) on CentOS 6.5 (final) to connect the remote VPC gateway on Amazon cloud. I got the tunnel up. However, only the traffic from/to the last ip range defined in leftsubnets is routed. The first one works for a brief second (maybe before the second tunnel was up), then no more routing. Below is my configuration.

conn aws-vpc
    leftsubnets={10.43.4.0/24 10.43.6.0/24}
    rightsubnet=10.43.7.0/24
    auto=start
    left=206.191.2.xxx
    right=72.21.209.xxx
    rightid=72.21.209.xxx
    leftid=206.191.2.xxx
    leftsourceip=10.43.6.128
    authby=secret
    ike=aes128-sha1;modp1024
    phase2=esp
    phase2alg=aes128-sha1;modp1024
    aggrmode=no
    ikelifetime=8h
    salifetime=1h
    dpddelay=10
    dpdtimeout=40
    dpdaction=restart
    type=tunnel
    forceencaps=yes

After start IPsec service:

# service ipsec status
IPsec running  - pluto pid: 8601
pluto pid 8601
2 tunnels up
some eroutes exist

# ip xfrm policy
src 10.43.6.0/24 dst 10.43.7.0/24 
dir out priority 2344 ptype main 
tmpl src 206.191.2.xxx dst 72.21.209.xxx
    proto esp reqid 16389 mode tunnel
src 10.43.7.0/24 dst 10.43.6.0/24 
dir fwd priority 2344 ptype main 
tmpl src 72.21.209.xxx dst 206.191.2.xxx
    proto esp reqid 16389 mode tunnel
src 10.43.7.0/24 dst 10.43.6.0/24 
dir in priority 2344 ptype main 
tmpl src 72.21.209.xxx dst 206.191.2.xxx
    proto esp reqid 16389 mode tunnel
src 10.43.4.0/24 dst 10.43.7.0/24 
dir out priority 2344 ptype main 
tmpl src 206.191.2.xxx dst 72.21.209.xxx
    proto esp reqid 16385 mode tunnel
src 10.43.7.0/24 dst 10.43.4.0/24 
dir fwd priority 2344 ptype main 
tmpl src 72.21.209.xxx dst 206.191.2.xxx
    proto esp reqid 16385 mode tunnel
src 10.43.7.0/24 dst 10.43.4.0/24 
dir in priority 2344 ptype main 
tmpl src 72.21.209.xxx dst 206.191.2.xxx
    proto esp reqid 16385 mode tunnel

I don't think firewall plays any role here, as I turned it off entirely just to test out the connections. routes are working as expected too. If I define single network on the left side, individually on a separated test connection, I can reach either subnets. Only when I define leftsubets, then, whichever range comes last will get routed in the end. Whichever comes first, works for a brief second before it stopped routing.

I could not find anyone on the internet have the similar problem... can someone please enlighten me?

cheers,

bo

user2413287
  • 31
  • 1
  • 1
  • 3

3 Answers3

5

When you use leftsubnets, you have to use rightsubnets, not rightsubnet. As stated on http://linux.die.net/man/5/ipsec.conf:

If both a leftsubnets= and rightsubnets= is defined, all combinations of subnet tunnels will be instantiated.

chicks
  • 3,639
  • 10
  • 26
  • 36
andyb
  • 51
  • 1
  • 2
4

This is due to a fault in the way AWS's implementation of IPSec handles SPIs (Security Parameters Indices). You can read about it in detail on libreswan's web site, but the upshot is that libreswan deals with the two ranges by establishing two tunnels (in your case, likely aws-vpc/1x1 and aws-vpc/1x2). OpenSWAN and StrongSWAN do likewise.

Each of these tunnels has its own SA (security association), each identified by a pair of SPIs, one for traffic you send (your SPI), and one for traffic Amazon sends (their SPI). Amazon, despite having established their SPI #1 for whichever tunnel comes up first, replaces it with SPI #2 when the second tunnel comes up (instead of keeping SPI #1 for tunnel one, and using SPI #2 just for tunnel two, as it should). Traffic is sent to AWS down tunnel one using your SPI #1, but Amazon encrypts the replies with their SPI #2, which naturally causes the traffic to fail to decrypt at your end.

That is why the first tunnel works only for a very brief period, until tunnel two comes up. If at some later time you force at your end the regeneration of SPIs for tunnel one, it will start working, but Amazon's new SPI #1 will replace their old SPI for tunnel two, and tunnel two will stop working just as tunnel one resumes service.

I've run into this on two separate occasions some years apart, most recently yesterday, so I don't think AWS are likely to fix it. It doesn't seem to affect commercial IPSec implementations (or AWS would have fixed it by now), I'm guessing because they don't really have the concept of tunnels between subnets but just aggregate a bunch of host-host tunnels all sharing the same SPIs. That is, however, only a guess.

Edit: weirdly, thanks to spending the intervening week working on this for a client who had a good AWS support contract, I have now confirmed what libreswan had to say about the latest SPI incorrectly replacing any earlier-established ones. Amazon also confirmed that they're doing this, and that one vpn- entity can only, to their mind, support one pair of SPIs. Their advice is to configure S/WAN so that only one tunnel is created, then to route traffic to particular destinations over it.

Fortunately, libreswan now supports this, in version 3.18 or later, provided you have a reasonably-recent Linux kernel. I can confirm that CentOS 7 satisfies on both counts.

Their detailed writeup is on their wiki, but the upshot is that you establish a tunnel with very wide source and destination ranges (0.0.0.0/0) using the Linux Virtual Tunnel Interface (vti) device, then tell libreswan not to set up routing across it (vti-routing=no). You can then choose which destinations to reach over this tunnel with simple route statements (ip route add 10.0.0.0/8 dev vti01).

I have this working in production. It even supports multiple simultaneous tunnels, later ones using different mark= and vti-interface= configuration options. Amazon also now supports associating a VPN with a transit gateway (TGW), to which many VPNs in the same AWS region can in turn be associated, so you really only need one VPN per AWS region, which is scalable.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
1

Try using:

leftsubnets={10.43.4.0/24,10.43.6.0/24,}

instead of:

leftsubnets={10.43.4.0/24 10.43.6.0/24}

Note: Add two commas. After first and last too.

techraf
  • 4,163
  • 8
  • 27
  • 44