13

We have a bucket with more than 500,000 objects in it.

I'm assigned a job where I've to delete files which have a specific prefix. There are around 300,000 files with the given prefix in the bucket.

For eg If there are 3 files

abc_1file.txt
abc_2file.txt
abc_1newfile.txt

I've to delete the files with abc_1 prefix only. I didn't find much in AWS documentation related to this.

Any suggestions on how can I automate this?

Axel
  • 323
  • 1
  • 6
  • 17

2 Answers2

27

You can use aws s3 rm command using the --include and --exclude parameters to specify a pattern for the files you'd like to delete.

So in your case, the command would be:

aws s3 rm s3://bucket/ --recursive --exclude "*" --include "abc_1*"

which will delete all files that match the "abc_1*" pattern in the bucket.

The behavior of these parameters is documented here

These instructions assume you have downloaded, installed and configured the AWS CLI tools

sippybear
  • 2,997
  • 1
  • 12
  • 12
11

As a complement to @sippybear's excellent answer, I would recommend the following, if somebody has a bucket with a trillion objects and the pattern of the files one wants to delete includes "parent directories", e.g. 'my/path/to/topdir/abc_1*':

aws s3 rm --dryrun --recursive --exclude '*' --include 'abc_1*' s3://mybucket/my/path/to/topdir/

Why?

  1. this will restrict the search of objects to delete to the parent directory, thus considerably speeding up the operation;
  2. really, do yourself a favor and start with --dryrun, even if you promptly interrupt it (ctrl-C); typos and other accidents happen and errors when deleting large number of files in a bucket can be very regrettable (even if you have proper backups)...

Once you're happy with what you see is about to be deleted, then remove the --dryrun.

Pierre D
  • 293
  • 2
  • 8