0

What would you suggest is the best option to get "enterprise" data from a client into AWS S3 so we can provide Heroku hosted SaaS?

The data volumes are tiny, it's just daily spreadsheet/csv records. But to provide a SaaS solution we need to get the data from the client machine to S3 for our Heroku app to use. This obviously should be an automated, secure process.

Possible options I know of are:

  1. SFTP the files through a service like cloudgates.net or cyberduck.io - but this needs a scheduling procedure to be set up somehow and seems inflexible.

  2. Use AWS Storage Gateway to move the files from a ringfenced machine to S3. This may be a non-starter as internal IT staff may not be in a position to install the VM, configure the Gateway and whatnot.

  3. "Oracle Secure Backup Cloud Module for Amazon S3" If we could get the client to create an Oracle database in which they place the data we need every day, Oracle RMAN could send it to S3. But we want plain text files in S3, not an Oracle database, so maybe this doesn't make sense (as we have a Heroku app not an EC2 instance with Oracle).

It seems to me the only suitable approach to automated data upload is to write the code using an AWS SDK in Java or .Net, that runs on the client machine. This is problematic if it's not something in house IT staff are comfortable with maintaining, particularly security-wise.

I thought I would ask here in case I'm missing a simple option that's an improvement over Bob from accounts uploading the files through our Heroku app to S3 every morning?

fuero
  • 9,413
  • 1
  • 35
  • 40
rigyt
  • 53
  • 1
  • 5

1 Answers1

1

I use SSH (SFTP) for similar tasks. The steps to make this nice and secure are:

  1. Generate a client certificate for your uploading needs and install it in you .ssh/authorized_keys or similar on the S3 instance.
  2. Use sftp -i <public key> <file> <remote location> to drop your file where you need it.
  3. Put the command from step 2 above in a cron job - then frolic.

This works very well for me in almost every case where I need an ad-hoc connection to a server. If you're on windows, using Putty along with Puttygen, Plink and the Windows scheduler service to achieve something similar should be possible.

You do have to use a scheduler for this recipy to work - I acually think that is much cleaner than making script or something that does its own timing. Using all these standard bits makes the thinking easier for others to follow too.

EDIT: Amazon has a friendly guide on the certificate bit right here.

MrMajestyk
  • 1,023
  • 7
  • 9
  • What's the in your case? AWS S3 can't receive SFTP directly, so it would need to go to an EC2 server (which I don't have) or to a third-party like Cyberduck.io or cloudgates.net, is that correct? – rigyt Mar 09 '15 at 13:20
  • I don't have hands on the setup anymore unfortunately, but it was one of the pricier Amazon options - maybe EC2. If the actual SFTP access is a problem, you might reverse the setup, having the S3 client connect to a SSH server at your premises. You could then put your files on a share that the SSH server can see. If that is not at option either, then yes - probably cyberduck.io. They have manuals [here](https://trac.cyberduck.io/wiki/help/en/howto/s3). – MrMajestyk Mar 09 '15 at 13:30
  • 1
    Thinking more about this, I realize you are right - it was EC2 I was using but with S3 storage (also). Without EC2, my strategy actually sucks. Except for step 3 - cron is still great :-) – MrMajestyk Mar 09 '15 at 13:35