6

Something happened recently with one of our S3 buckets:

enter image description here

I started looking for where all this extra stuff was coming from, but the metrics I gathered don't seem to match what is going on in CloudWatch (or our bill).

The bucket has a handful of different key prefixes ('folders'), so the first thing I did was to try and work out if any of them was contributing significantly to this number, like so:

aws s3 ls --summarize --human-readable --recursive s3://my-bucket/prefix

However none of the prefixes seemed to contain a huge amount of data, nothing more than a few GB.

I finally tried running

aws s3 ls --summarize --human-readable --recursive s3://my-bucket

...and I got a total size of ~25GB. Am I doing the wrong thing to try and find the 'size of a folder', or misunderstanding something? How can I find where all this extra storage is being used (and find out what process is running amok)?

user31415629
  • 301
  • 2
  • 12
  • Object versioning is likely turned on. In the UI you can have it show versions. You can create a lifecycle policy to have S3 delete / archive old versions of objects after x days. – Tim May 03 '19 at 09:20
  • Have you tried checking in [AWS Cloudtrail](https://aws.amazon.com/cloudtrail/) ? – scetoaux May 03 '19 at 09:21

3 Answers3

5

It was aborted multipart uploads. S3 keeps every uploaded part of every failed multipart upload indefinitely by default! A process had been failing and retrying multipart uploads without explicitly cleaning up the failed transfers.

We remedied this by temporarily enabling versioning, setting a lifecycle rule to remove aborted multipart upload chunks after 1 day, then waited a day, disabling versioning again once the chunklets were cleared.

user31415629
  • 301
  • 2
  • 12
3

There is extensive support and documentation on how to fix this issue. The most common one is moving old versions after a certain amount of time.

Another option is to remove incomplete multipart uploads, which is supported by AWS since 2016 (source). You can find this under S3 lifecycle rules:

enter image description here

1

I suspect someone uploaded a lot of stuff into your S3 and then deleted it. If you have S3 Versioning enabled you'll probably see a lot of deleted files.

Start with aws s3api list-object-versions and parse the output.

If you find the unneeded old versions you can delete them with aws s3api delete-object ... --version-id ... and get rid of them immediately.

Alternatively you can create a S3 Lifecycle Policy that will permanently remove old versions automatically after certain time.

Hope that helps :)

MLu
  • 23,798
  • 5
  • 54
  • 81
  • We didn't have versioning enabled, but we suspected it may have something to do with failed multi-part uploads. Versioning has now been enabled and we have set a lifecycle to clear those up. When the lifecycle has been effected for the first time, we'll know if that was the culprit... – user31415629 May 03 '19 at 10:23