0

We are currently using EMR for easy job submission for our spark jobs. Recently I came across the "FSx lustre + S3" solution that is being advertised as ideal for HPC situations. EMRFS however is also said to be optimized for this particular scenario, making S3 look like a local hadoop filesystem.

So I am wondering, why would anyone choose either one of these 2 in terms of cost and performance?

This question could be a follow up to AWS S3 costs for when AWS EMR uses it but unfortunately I don't have the reputation to post a comment there.

Thanks in advance for the help.

dimisjim
  • 215
  • 2
  • 10

1 Answers1

0

AS You are using EMR for your Compute operations and S3 for storage ..

FSX when integrated against s3 would provide a high throughput on your jobs because of its high IOPS... This would indeed be helpful for your execution timelines.. But again that would invite a higher cost.

https://www.youtube.com/watch?v=ZADHiZa3Hjo&list=WL&index=21&t=2752s

Mentioned above is one the finest Reinvent link