3

I have set up a Data Pipeline that imports files from an S3 bucket to a DynamoDB table, based on the predefined example. I want to truncate the table (or drop and create a new one) every time the import job starts. Of course this is possible with the AWS SDK but I would like to do it only by using the Data Pipeline.

Is is possible to do so?

Thanks for any help

FLXN
  • 658
  • 7
  • 6

1 Answers1

5

I'm not sure if you still need to perform this operation since you asked it many months ago, but due to the lack of information on internet about this subject I've decided to create a tutorial and post it here to help other people who's facing the same situation .

This is what worked for me.

Basically you'll need the following:

  • S3 bucket ( Where you'll upload a shell script to be executed )
  • AMI EC2 ( That will execute that script above )
  • A Pipeline ( That already imports DynamoDB data to a S3 bucket )

If you already got all of them, then we're good to go!

Follow these steps:

  1. Add an activity and name it as 'CleanTableJob'

enter image description here

  1. On CleanTableJob set settings accordingly to this: ( On Runs on -> Select New Resource and name it as CleanDynamodbTableResource)

enter image description here

  1. On CleanDynamodbTableResource set settings accordingly to this:

enter image description here

  1. On your S3 bucket you may provide whatever that handles deleting data on DynamoDB like that:

    java -jar /home/ec2-user/downloads/dynamodb_truncate_table-1.0-SNAPSHOT.jar

  2. That's it:

enter image description here

Hope it helps you guys out