I have a VPC that I created a long time ago before NAT gateways were a thing. Like many setups I created a NAT instance to route outbound traffic. Yesterday my NAT instance crashed. I was able to reboot but it did create a bit of a headache so I decided to try to migrate to a NAT Gateway.
I don't care if the outbound IP is the same. As a test I created a new VPC with an instance to ensure I got the settings correct. I then created the gateway in the existing VPC. I then swapped my main routing table to use it as the gateway instead of the instance.
My setup is two subnets, one public that points to an internet gateway and one private pointing to the NAT gateway. I use OpenVPN on an instance to reach my private instances.
However when I swap to the NAT gateway I can no longer route outbound on existing instances, BUT when I create a new instance it works fine. The problem is the same in reverse. If I set the subnet of my new instance that works with the NAT gateway, to use the NAT instance instead, it can no longer see the outside world.
I'm not changing the route tables on the instances themselves, only the routes in the web console.
I also tried (with my test instance) to restart networking and then a reboot but neither helped.
I've read a couple of migration guides that seem to indicate this should work, obviously the routes change but then don't work right after that. Is there a magic trick to this or am I going to have to recreate my instances to work with NAT gateway?
EDIT: Another wrinkle in this. On a whim I changed the security group of my test instance (I used a default one when I created it). Using an existing security group with the same outbound rules made the instance able to connect but swapping it back to the original group, and it can't connection.
EDIT 2: Changing the security group of one of my existing instances doesn't seem to work. Changing security group only seems to work on the new instance that I set up as a test.
As suggested, here's some screens of my setup:
Here are my route tables. The named one I added as a test and it routes to the nat gateway. I have just one subnet it, where I have an instance as a test. The second in the list is the default route which all other subnets route to, using the NAT instance.
Here's verification that my private-subnet-1e
routes to the NAT Gateway:
And verification that one of my production server subnets route using the main gateway through the NAT instance:
Security group for test instance Outbound rules: (everything)
Security group for prod instance that can't route if I change the table. Same, everything is allowed