AWS FSx for lustre with S3 vs EMR (with EMRFS) for spark jobs

Question

We are currently using EMR for easy job submission for our spark jobs. Recently I came across the "FSx lustre + S3" solution that is being advertised as ideal for HPC situations. EMRFS however is also said to be optimized for this particular scenario, making S3 look like a local hadoop filesystem.

So I am wondering, why would anyone choose either one of these 2 in terms of cost and performance?

This question could be a follow up to AWS S3 costs for when AWS EMR uses it but unfortunately I don't have the reputation to post a comment there.

Thanks in advance for the help.

Sameer Rao · Answer 1 · 2020-08-15T02:41:30.377

0

AS You are using EMR for your Compute operations and S3 for storage ..

FSX when integrated against s3 would provide a high throughput on your jobs because of its high IOPS... This would indeed be helpful for your execution timelines.. But again that would invite a higher cost.

https://www.youtube.com/watch?v=ZADHiZa3Hjo&list=WL&index=21&t=2752s

Mentioned above is one the finest Reinvent link

edited Aug 15 '20 at 02:41

answered Aug 15 '20 at 02:26

Sameer Rao

1
1

AWS FSx for lustre with S3 vs EMR (with EMRFS) for spark jobs

1 Answers1