Deleting S3 files with a given prefix only

Question

We have a bucket with more than 500,000 objects in it.

I'm assigned a job where I've to delete files which have a specific prefix. There are around 300,000 files with the given prefix in the bucket.

For eg If there are 3 files

abc_1file.txt
abc_2file.txt
abc_1newfile.txt

I've to delete the files with abc_1 prefix only. I didn't find much in AWS documentation related to this.

Any suggestions on how can I automate this?

sippybear · Accepted Answer · 2020-02-25T21:57:36.667

27

You can use aws s3 rm command using the --include and --exclude parameters to specify a pattern for the files you'd like to delete.

So in your case, the command would be:

aws s3 rm s3://bucket/ --recursive --exclude "*" --include "abc_1*"

which will delete all files that match the "abc_1*" pattern in the bucket.

The behavior of these parameters is documented here

These instructions assume you have downloaded, installed and configured the AWS CLI tools

edited Feb 25 '20 at 21:57

answered Feb 25 '20 at 19:40

sippybear

2,997
1
12
12

Thanks, worked fine. – Axel Feb 26 '20 at 06:05
1

Important note: Order of operations matters! The same exact command can fail if you do include first and exclude second. – Almenon Sep 30 '20 at 22:27
9

Excellent, but make sure to first take a look at what it would delete with `--dryrun`! – Pierre D Feb 11 '21 at 17:24
any idea how to do it with python boto3? – Always Sunny Aug 03 '21 at 10:52

score 11 · Answer 2 · answered Feb 11 '21 at 17:41

As a complement to @sippybear's excellent answer, I would recommend the following, if somebody has a bucket with a trillion objects and the pattern of the files one wants to delete includes "parent directories", e.g. 'my/path/to/topdir/abc_1*':

aws s3 rm --dryrun --recursive --exclude '*' --include 'abc_1*' s3://mybucket/my/path/to/topdir/

Why?

this will restrict the search of objects to delete to the parent directory, thus considerably speeding up the operation;
really, do yourself a favor and start with --dryrun, even if you promptly interrupt it (ctrl-C); typos and other accidents happen and errors when deleting large number of files in a bucket can be very regrettable (even if you have proper backups)...

Once you're happy with what you see is about to be deleted, then remove the --dryrun.

Deleting S3 files with a given prefix only

2 Answers2

Why?