0

We have a bunch of backend servers in the form of EC2 instances based in a private subnet in AWS VPC that need to talk to a 3rd party API. This API is limiting the requests we can send based on the originating ip address and while scaling our setup we have started hitting the limits on the IP of the NAT gateway that is used for all outbound traffic.

Thus I want to setup a proxy for outbound traffic with several EIPs attached. For testing I am currently using an Amazon Linux 2 instance with 2 ENIs with 2 EIPs attached each. The backend servers open an SSH tunnel to the outbound proxy and map the 3rd party API to a local port, an entry in the servers hosts file redirects all traffic to that hostname to localhost and this setup is working fine in general but outbound traffic from the proxy is always using only the first associated EIP.

So my setup looks like this:

ENI1: eth0
private IP1: 10.0.11.81
private IP2: 10.0.11.82

ENI2: eth1
private IP3: 10.0.11.52
private IP4: 10.0.11.53

original route table:
default via 10.0.11.1 dev eth0
default via 10.0.11.1 dev eth1 metric 10001
10.0.11.0/24 dev eth0 proto kernel scope link src 10.0.11.81
10.0.11.0/24 dev eth1 proto kernel scope link src 10.0.11.52
169.254.169.254 dev eth0

I now want to be able to specify which backend server uses which EIP when calling the API via the outbound proxy. My first try was the following:

  • setup 4 different users on the proxy host
  • add iptable rules for each user like so: iptables -t nat -m owner --uid-owner user1 -A POSTROUTING -j SNAT --to IP1 etc.
  • this works for the 2 IPs that are attached to the primary ENI (ie eth0 in the machine) but does not work for the 2 IPs associated with the second ENI (eth1)
  • adding -o eth1 to the statement did not work either

My next try was to create custom routing tables for each IP address and matching them to policy rules:

  • create custom route table i.e. for IP3:
default via 10.0.11.1 dev eth1
10.0.11.0/24 dev eth1 proto static scope link src 10.0.11.52
169.254.169.254 dev eth1 scope link
  • create iptables rule to mark traffic originating from user3: -A OUTPUT -m owner --uid-owner 1003 -j MARK --set-xmark 0x3/0xffffffff
  • create rule to utilize custom route table for all packets marked 3: 32763: from all fwmark 0x3 lookup ip3
  • this again does not work. packets do get treated differently. all users can communicate with the world except for user3 in the above example.

What am I doing wrong? Is there something trivial I am missing or is my entire approach doomed to fail? I'm very open to suggestions, both on getting this setup working as well as alternative approaches...

Thanks a lot in advance!

MoWo
  • 306
  • 9
  • A simpler stopgap would be one NAT gateway per subnet / AZ, with routing set up appropriately. NAT Instance instead of NAT gateways would be cheaper but require more setup / maintenance. John's answer is probably best though, have the limit increased. – Tim Jan 06 '22 at 00:46
  • The outbound proxy is my stopgap. Reorganizing subnets, moving servers around etc. is going to be a lot mor effort than simply redirecting part of the outbound traffic through an SSH tunnel. That can be done to existing machines with minimal impact to the architecture and no downtime. – MoWo Jan 06 '22 at 16:19

2 Answers2

3

Contact the organization running the API, and explain the situation. Creating a business relationship is a good start to solving the problem.

Implement IPv6 to reduce technical complexity. AWS will give you a /64 per subnet of public space, allowing direct communication between your instances and the API. Unique address per instance makes it apparent you really are scaling out. Asking that your nets be allowed a higher quota becomes easier, as all are in your VPC's /56.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32
  • Thanks! Having the limit increased would indeed be an option that I haven't considered yet. IPv6 is unfortunately not supported by them, else that would have been my first choice. I did consider segmenting the subnet further to be able add more NAT gateways and spread the load or even moving all the backend servers into a public subnet to be able to assign individual EIPs to them directly. But I'd like to avoid moving backend servers with database access into a public subnet and resegmenting the current setup is also a considerable amount of effort. Outbound proxy seemed like a good stopgap. – MoWo Jan 06 '22 at 13:08
0

I did find a solution myself after all and documented it here: Best way to route traffic based on logged in user via specific redundant route?

Just in case smeone stumbles over this in the future.

MoWo
  • 306
  • 9