65

"We highly recommend that you never grant any kind of public access to your S3 bucket."

I have set a very granular public policy (s3:GetObject) for one bucket that I use to host a website. Route53 explicitly supports aliasing a bucket for this purpose. Is this warning just redundant, or am I doing something wrong?

Andrew Johnson
  • 761
  • 1
  • 5
  • 6
  • 1
    @MichaelHampton It will show this in the S3 console, without much additional context. https://businessinsights.bitdefender.com/amazon-stop-s3-buckets-leaking-data – ceejayoz Dec 16 '17 at 22:53
  • Related - can AWS see into a private bucket or does it have to be public for AWS to access files inside? – Criggie Dec 18 '17 at 19:12
  • @Criggie AWS being their support team? Or something else? – ceejayoz Dec 20 '17 at 19:37
  • @ceejayoz yes the AWS support team. – Criggie Dec 21 '17 at 00:18
  • I'd imagine at a certain level of support they might poke around inside, although S3 supports encryption with a non-Amazon key. Their processes should ensure it's not done without explicit permission, I'd think. – ceejayoz Dec 21 '17 at 01:05

2 Answers2

80

Yes, if you know what you're doing (edit: and everyone else with access to it does, too...), you can ignore this warning.

It exists because even large organizations who should know better have accidentally placed private data into public buckets. Amazon will also send you heads-up emails if you leave buckets public in addition to the in-console warnings.

Accenture, Verizon, Viacom, Illinois voter information and military information has all been found inadvertently left open to everyone online due to IT bods misconfiguring their S3 silos.

If you are absolutely, 100% certain that everything in the bucket should be public and that no one's going to accidentally put private data in it - a static HTML site's a good example - then by all means, leave it public.

ceejayoz
  • 32,469
  • 7
  • 81
  • 105
  • 2
    In practice, you're virtually *never* 100% certain, so best practice is to don't. – Shadur Dec 17 '17 at 20:32
  • It was just a few months ago where either the FBI or CIA left private data that was supposed to be secure on a public S3. I'll see if I can find a link to the news article. – Reactgular Dec 18 '17 at 15:47
  • See here: https://gizmodo.com/thousands-of-job-applicants-citing-top-secret-us-govern-1798733354 – Reactgular Dec 18 '17 at 15:49
  • Good sire, can you guide me how to specifically allow only my website and Android app to be able to access my bucket's objects? Basically I don't want people scraping my bucket contents. But my website and app should be able to load them. – bad_keypoints Dec 29 '18 at 17:04
  • @bad_keypoints what you describe is (basically) impossible in S3. There is no way your website can tell the difference between someone scraping and a regular visitor. Technically you could write server-side logic to look at how much your user is consuming, and stop them at a certain point ... but such logic would be a PITA, and you'd have to use a real host (eg. EC2 instead of S3) if you wanted to. In short, if to worry about scrapers you need a far more advanced back-end (and it won't be a 10)% solution)... or you can just not worry. – machineghost Sep 21 '20 at 16:55
41

The privacy issue featured in ceejayoz's answer is not the only problem.
Reading objects from an S3 bucket has a price. You will be billed by AWS for each download from this bucket. And if you have a lot of traffic (or if someone who wants to hurt your business starts to heavily download files all day long) it will quickly become expensive.

If you want files from your bucket to be publicly accessible, you should create a Cloudfront Distribution that points to and is granted access to the S3 bucket.

Now, you can use the Cloudfront Distribution's domain name to serve your files without granting any S3 access to the public.
In this configuration, you pay for Cloudfront's data usage instead of S3's. And at higher volumes it's a lot cheaper.

Quentin Hayot
  • 523
  • 3
  • 7
  • 2
    CloudFlare also works, and is likely to be cheaper still. – chrylis -cautiouslyoptimistic- Dec 18 '17 at 18:17
  • "you pay for Cloudfront's data usage instead of S3's. And at higher volumes it's a lot cheaper." -- HUGE caveat here is that the request pricing on CloudFront is considerably more expensive than S3's, so the attacker could just send ton of requests (and even cancel them, to not have to actually receive the data and not need a lot of bandwidth themselves). – Bruno Reis Jan 31 '21 at 09:34