Questions tagged [amazon-emr]

8 questions
2
votes
1 answer

AWS S3 costs for when AWS EMR uses it

When I run an AWS EMR cluster and it reads from and writes to an AWS S3 bucket (or multiple buckets), what are the costs for that data transfer? Is that data transfer? Free, because it's internally, in the AWS cloud? The normal S3 costs, with…
Marco
  • 123
  • 1
  • 6
1
vote
1 answer

Spark YARN capacity scheduler

I am trying to setup capacity scheduler in Amazon EMR with 2 queues in addition to the default queue. I have successfully created the queues user1 and user2, however when I use spark-submit to run a script on a new queue it will get stuck in…
sjensen85
  • 11
  • 1
1
vote
0 answers

Fastest way to import files in Spark?

I’m playing around with Spark 3.0.1 and I’m really impressed by the performance with Spark SQL on GB of data. I’m trying to understand what’s the best way to import multiple JSON files in the Spark dataframe before running the analysis…
int 2Eh
  • 183
  • 1
  • 2
  • 6
0
votes
1 answer

AWS FSx for lustre with S3 vs EMR (with EMRFS) for spark jobs

We are currently using EMR for easy job submission for our spark jobs. Recently I came across the "FSx lustre + S3" solution that is being advertised as ideal for HPC situations. EMRFS however is also said to be optimized for this particular…
dimisjim
  • 215
  • 2
  • 10
0
votes
1 answer

psql installation on linux requires systemd

I am installing psql on my AWS EMR (EC2 instance) which is Amazon Linux (not Amazon Linux 2). I am getting an error after running the command sudo yum install -y postgresql10 Error: Package: postgresql10-10.7-2PGDG.rhel7.x86_64 (pgdg10) …
0
votes
1 answer

my emr cluster is being terminated with error after the status is being set to starting

Hi when I create EMR cluster. The status says it is being created but after 58 minutes it throws in error saying Master - 1: Error provisioning instances. Error message(Screenshot of error attached) I tried multiple times but all attempts was…
rgzv
  • 3
  • 1
0
votes
0 answers

AWS EC2 with and without EMR and Spark will not SSH to Port 22

I'm on a MacBookPro. I tried starting an EC2 instance with EMR and Rstudio and Spark. I got a port 22 time out. I asked AWS for help and took down my firewall and restarted my modem. Still, nothing but port 22 timeout. AWS managed to get one EC2…
Ryan
  • 101
  • 1
0
votes
1 answer

installing packages in AWS EMR

I'm trying to install Google Tink in AWS EMR 5.28.0 without much luck. It looks like the AWS EMR image is rather strange in nature. Any ideas? sc.install_pypi_package("tink") error: Could not find bazel executable. Please install bazel to compile…
Koenig Lear
  • 101
  • 2