I have 200 jsonl (json-lines) files in an s3 bucket. Each file contains 100,000 JSONs to be written into a DynamoDB.
I want to use Lambda to download the file from S3, and batch-write it into the DynamoDB (the files already perfectly match the table schema).
I have 200 files, but I can't call 200 lambdas concurrently -- since DynamoDB is limited to just 10,000 WCU's per second, I can only write 10,000 rows per second. And Lambda's can only last 300 seconds, before they time out.
What's the best way to do this?
My current thinking was to asynchorously call 5 Lambdas at once, and monitor the log files to see how many are done, calling the next one, only after one is complete?
OR...
Can I set the concurrent execution limit to 5 for the lambda function, and then asychorously call the function 200 times (one for each file)? Will AWS automatically trigger the next lambda when the one is complete?