High availability Bastion host - Best practices, ELB, EIP?

Question

I am currently trying to figure out a good configuration to make a Bastion host highly available. I want to meet the following targets:

The bastion host(s) need to able to withstand a Availability Zone failure and ec2 instance failure. A small downtime (a few minutes) may be acceptable.
The bastion host(s) needs to be reachable via a permanent DNS entry.
No manual intervention needed

My current setup is as follows: Bastion host in Auto Scaling Group in two availability zones, ELB in front of the Auto Scaling Group.

This setup has a few advantages:

Easy to setup using CloudFormation
Auto Scaling Groups over two AZs can be used to guarantee availability
The does not count towards the accounts EIP limit

It also has some disadvantages:

With two or more bastion hosts behind the ELB, SSH host key warnings are common, and I do not want our users to get accustomed to ignore SSH warnings.
The ELB costs money, as opposed to EIP. About as much as the bastion host, actually. This is not really much of a concern, I added this point only for sake of completeness.

The obvious other solution is to use an ElasticIP, which has - as I see it - a few drawbacks:

I can (obvously) not attach an EIP to an Auto Scaling Group directly
When not using Auto Scaling Groups, I have to put something in place that starts new EC2 bastion hosts if the old ones fail, e.g. using AWS Lambda. This adds additional complexity.
When the EIP is attached to an Auto Scaling Group manually, on Availability Zone failure, the EIP will get unattached and not be reattached to a new instance. Again, this can be resolved by running a program (on the instance or AWS Lambda) that reattaches the EIP to an instance. Again this adds additional complexity.

What are best practices for High availability SSH instances, i.e. bastion hosts?

I wonder if you can use a managed NAT gateway in reverse, as a bastion host... probably not, but it might be worth considering. Alternately could you just run one small instance in each AZ and publish the EIP of them to your users / servers? — Tim, Dec 23 '16 at 19:36
@Tim NAT gateways are for egress traffic only. No port forwarding is possible. — EEAA, Dec 23 '16 at 23:44
@EEAA that's what I thought, but I wondered if there was any way to set it up in reverse. — Tim, Dec 24 '16 at 01:40

Tim · Answer 1 · 2017-05-11T19:38:49.753

It looks like the requirement is to provide bastion functionality at lowest reasonable cost with an RTO of say 5 minutes. No RPO is applicable as it's effectively a stateless proxy that can be rebuilt easily.

I'd have a bastion host, defined either as an AMI or CloudFormation script (AMI is faster), inside an autoscaling group with min/max/target set to 1. I wouldn't have a load balancer as there's no need for that as far as I can see. This instance would be registered with Route53 with a public domain name so even if the instance changes you will be able to access it, and that should eliminate SSH warnings. I might start with one instance in each subnet, but I'd probably turn one off if they're reliable enough - they should be.

A CloudFormation deployment of bastion hosts is described by Amazon here. Amazon have a best practice guide here. You shouldn't address internal resources using their Elastic IP as they're public IPs and traffic to them is charged, whereas private IP traffic isn't charged. Domain names are cheaper. This might involve some CloudFormation script tweaking.

You are charged no more for an traffic over an ElasticIP than you are for traffic over an ELB or for just normal public IPs. — Ash Berlin-Taylor, May 11 '17 at 15:04
Updated. I was talking about internal traffic using private IPs as traffic is free, whereas traffic to elastic or public IPs is charged. — Tim, May 11 '17 at 19:39

High availability Bastion host - Best practices, ELB, EIP?

1 Answers1