10

I like to know is it possible to load balance sftp servers in AWS. I have 2 servers, and each of my servers are using s3fs-fuse to mount the same S3 bucket onto a mount point. Both of my ec2 instances are able to read/write to their mount points, and from S3, I can see the files from both servers.

What I am looking for is having SFTP to transfer files and using Amazon S3 to store my files. Files would be uploaded and download daily.

https://github.com/s3fs-fuse/s3fs-fuse

As for my next step, I like to know how can I load balance my sftp servers, so that when a user connects to a specific IP address, it will redirect them to one of my sftp servers. I took a look at elastic load balancers, but they seem to only permit specific ports. I have also investigated HAProxy, but I am unsure how secure that solution will be. I have to take HIPAA compliance into consideration. The load balancer must be a static IP address as our vendors does not support DNS hostnames.

Andrew Gaul
  • 225
  • 1
  • 4
popopanda
  • 201
  • 3
  • 4
  • 7
    I have never wanted to set myself on fire so much as I do now. – Wesley Dec 23 '15 at 00:32
  • 3
    TBH, using s3fs-fuse for PHI seems quite foolish. – EEAA Dec 23 '15 at 01:51
  • For the record, ELBs do nowadays support all ports (1-65535): https://aws.amazon.com/blogs/aws/elastic-load-balancing-update-more-ports-additional-fields-in-access-logs/. But, ELBs also require clients to use the AWS-generated DNS name (which also points to two public IP addresses, which can change). – Jukka Dec 23 '15 at 07:13
  • The biggest question is though, why do you want to load balance? If it's for HA, you will still have a SPoF in your haproxy. If it's for cpu requirements of ssh, that would be hard to believe, but a valid reason. – w00t Dec 26 '15 at 09:48
  • Also, did you consider asking your vendor to support encrypted S3 uploads? Not hard at all… – w00t Dec 26 '15 at 09:50
  • I'm not sure what folks like @w00t are so surprised about -- there's plenty of reasons for wanting to load balance an SFTP server, mainly for HA and SPoF reasons. For some businesses (Health care!) SFTP is the defacto way of moving info and if your SFTP goes down you're screwed. If you've gone to immutable infrastructure (and you should) then you'll need a way to keep the service running while you swap in new machines. – rusty Apr 24 '17 at 20:23
  • @rusty in this design there is still a SPoF in Haproxy. You need to failover an IP to fix that, and then you can just as well run a hot standby. – w00t Apr 24 '17 at 20:30
  • @w00t Not commenting on a design, just surprised at the reactions I see from the concept of the question. Re: HAProxy the OP says they were investigating it, not settled on it. – rusty Apr 26 '17 at 13:11

2 Answers2

22

My comment could probably use some clarification. I spouted off with the eloquence of an inebriated yak:

I have never wanted to set myself on fire so much as I do now.

Why? Why would I say such a thing? Mostly because I'm an awful person. However, aside from that, I can explain my outburst by going over the original post piecemeal:

I like to know is it possible to load balance sftp servers in AWS.

Yes. Impossible is nothing. But know that unless you get a special SFTP package, the load balancing will be entirely up to you to build. The service being SFTP and being hosted in AWS is inconsequential.

I have 2 servers, and each of my servers are using s3fs-fuse to mount the same S3 bucket onto a mount point. Both of my ec2 instances are able to read/write to their mount points, and from S3, I can see the files from both servers.

You're off to a good start with a shared file system, the performance and reliability of the setup notwithstanding.

As for my next step, I like to know how can I load balance my sftp servers, so that when a user connects to a specific IP address, it will redirect them to one of my sftp servers.

The question is now: Why do you want to load balance. There is a fantastic amount of throughput and processing power afforded to the Amazon instance catalog and the need to load balance SFTP would mean you're approaching porn levels of network activity. Keep it simple, repeatable, and resilient wherever possible. Get an i2.xlarge with an SFTP daemon running on it and you should be fine no matter what. Build it with Puppet/Chef/$trendy-config-management-tool and you're in business. Moving on however...

I took a look at elastic load balancers, but they seem to only permit specific ports. I have also investigated HAProxy, but I am unsure how secure that solution will be.

HAproxy is exactly the kind of tool you need. Your uncertainty about security is easily dispelled with just a few hours of reading. My desire to self immolate is rising from this point on. If you're unsure about something, go become sure about it. HAProxy is the choice for many financial institutions, hospitals, and governments.

I have to take HIPAA compliance into consideration.

Totally understood, but compliance is not primarily the role of tools. You'll need to understand the concepts behind the HIPAA compliance requirements, and see how HAproxy can fulfill them. HAProxy is neither HIPAA compliant nor HIPAA non-compliant. No matter which tool you use, you'll need to independently verify the underlying assumptions and requirements of your compliance and regulatory needs. In fact, if anything, S3 and the use of Amazon instances should be inspected more carefully than the use of HAproxy.

The load balancer must be a static ip address as our vendors does not support DNS hostnames

This. This did it. Your vendor is bad and should feel bad. Now I want to jump into lava. Not supporting something basic like DNS resolution is entirely unrelated, but also it's like saying "A car must have an engine for me to use it." Well of course. Of course a load balancer is going to have the ability to use a static IP address. There are many more considerations that you need to be thinking about above simple static IP addresses.

TL;DR

Yes you can load balance SFTP with HAproxy. HIPAA compliance is up to you to discern and tool choice will not check boxes. You have some Googling to do and documentation to read.

I have some flames to put out.

Wesley
  • 32,320
  • 9
  • 80
  • 116
  • Thanks for the honest feedback and suggestions. I will look through them, but this will give me a start. – popopanda Dec 23 '15 at 01:35
  • 7
    This is my favorite answer on SF in quite some time. <3 – ceejayoz Dec 23 '15 at 01:53
  • Netscaler also has an SFTP mode for virtual servers, might want to consider that as well. An EC2-instance-backed Netscaler appliance (with different bandwidth options) is available through the AWS marketplace. Costs money though. – Jukka Dec 23 '15 at 07:17
  • After working in healthcare IT for near on a decade, 3rd party vendors being unable to support dns lookups, using domains that are not inter-system accessible, and other levels of odd which fly in the face of accepted standards are not particularly surprising and occur much more often than anyone would like. For example, several years ago, a vendor told me they couldn't support ssh public key authentication because it didn't use passwords that could be rotated every 30 days. HIPAA seems to cause a certain amount of hyper-paranoia and confusion. I'm surprised that AWS is willing to be a BA. – Andrew Domaszek Dec 24 '15 at 06:09
0

Yes. It is possible by using AWS Load Balancer.

  1. Create a Load Balancer with Listener on port 22.
  2. Create a Target Group by having 2 of your SFTP Instance.
  3. Properly align your SGs on both Instance and Load balancer.
Alexander Tolkachev
  • 4,513
  • 3
  • 14
  • 23