AWS S3 bucket size has exploded, but I can't figure out how

Question

Something happened recently with one of our S3 buckets:

I started looking for where all this extra stuff was coming from, but the metrics I gathered don't seem to match what is going on in CloudWatch (or our bill).

The bucket has a handful of different key prefixes ('folders'), so the first thing I did was to try and work out if any of them was contributing significantly to this number, like so:

aws s3 ls --summarize --human-readable --recursive s3://my-bucket/prefix

However none of the prefixes seemed to contain a huge amount of data, nothing more than a few GB.

I finally tried running

aws s3 ls --summarize --human-readable --recursive s3://my-bucket

...and I got a total size of ~25GB. Am I doing the wrong thing to try and find the 'size of a folder', or misunderstanding something? How can I find where all this extra storage is being used (and find out what process is running amok)?

Object versioning is likely turned on. In the UI you can have it show versions. You can create a lifecycle policy to have S3 delete / archive old versions of objects after x days. — Tim, May 03 '19 at 09:20
Have you tried checking in [AWS Cloudtrail](https://aws.amazon.com/cloudtrail/) ? — scetoaux, May 03 '19 at 09:21

score 5 · Accepted Answer · answered May 07 '19 at 16:58

It was aborted multipart uploads. S3 keeps every uploaded part of every failed multipart upload indefinitely by default! A process had been failing and retrying multipart uploads without explicitly cleaning up the failed transfers.

We remedied this by temporarily enabling versioning, setting a lifecycle rule to remove aborted multipart upload chunks after 1 day, then waited a day, disabling versioning again once the chunklets were cleared.

Why did you need to temporarily enable versioning? – Tim Ludwinski Apr 17 '20 at 15:56 — Tim Ludwinski, Apr 17 '20 at 15:56

score 3 · Answer 2 · answered Sep 06 '19 at 14:21

There is extensive support and documentation on how to fix this issue. The most common one is moving old versions after a certain amount of time.

Another option is to remove incomplete multipart uploads, which is supported by AWS since 2016 (source). You can find this under S3 lifecycle rules:

score 1 · Answer 3 · answered May 03 '19 at 09:22

1

I suspect someone uploaded a lot of stuff into your S3 and then deleted it. If you have S3 Versioning enabled you'll probably see a lot of deleted files.

Start with aws s3api list-object-versions and parse the output.

If you find the unneeded old versions you can delete them with aws s3api delete-object ... --version-id ... and get rid of them immediately.

Alternatively you can create a S3 Lifecycle Policy that will permanently remove old versions automatically after certain time.

Hope that helps :)

answered May 03 '19 at 09:22

MLu

23,798
5
54
81

We didn't have versioning enabled, but we suspected it may have something to do with failed multi-part uploads. Versioning has now been enabled and we have set a lifecycle to clear those up. When the lifecycle has been effected for the first time, we'll know if that was the culprit... – user31415629 May 03 '19 at 10:23

AWS S3 bucket size has exploded, but I can't figure out how

3 Answers3