Questions tagged [apache-spark]

29 questions
6
votes
2 answers

How can I run Spark on a cluster using Slurm?

I have written a program example.jar which uses a spark context. How can I run this on a cluster which uses Slurm? This is related to https://stackoverflow.com/questions/29308202/running-spark-on-top-of-slurm but the answers are not very detailed…
mxmlnkn
  • 395
  • 3
  • 11
5
votes
0 answers

Spark Error: Failed to Send RPC to Datanode

We have quite few issues with our Spark Thrift server. It is a new Ambari cluster and no Spark jobs are running now. From the log we can see an error message: Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149 Please advice why this…
shalom
  • 451
  • 12
  • 26
5
votes
0 answers

How many disks for spark_local_dirs?

I'm looking for a solution to improve my Spark cluster performances, I have read from http://spark.apache.org/docs/latest/hardware-provisioning.html: We recommend having 4-8 disks per node , I have tried both with one and two disks but I have…
LucaGuerra
  • 151
  • 1
4
votes
1 answer

How to configure Spark client running in a Docker container for two-way communication with a remote Spark cluster?

spark-submit seems to require two-way communication with a remote Spark cluster in order to run jobs. This is easy to configure between machines (10.x.x.x to 10.x.x.x and back) but becomes confusing when Docker adds an extra layer of networking…
Leo
  • 973
  • 6
  • 21
  • 38
3
votes
0 answers

Spark: Pi sample throws SocketTimeoutException in cluster k8s mode

I set up a Spark 2.3.1 cluster on kubernetes, however, I have trouble sending a sample SparkPi job to it: The submit script I'm using: bin/spark-submit \ --master k8s://https://10.0.15.7:7077 \ --deploy-mode cluster \ --name spark-pi \ …
3
votes
1 answer

S3 cross-account bucket permissions

Similar to what is described in this article[0], the company I work for uses a bastion AWS account to store IAM users and other AWS accounts to separate different running environments (prod, dev, etc.). The reason this is important is that we have…
c4urself
  • 5,270
  • 3
  • 25
  • 39
3
votes
0 answers

Memory problems with Spark application on AWS EMR

I've been trying to get to the bottom of a memory issues for some time now and I simply can't fathom out what the problem is. Any help is greatly appreciated. The error is: [![OpenJDK 64-Bit Server VM warning: INFO:…
null
  • 139
  • 2
  • 10
2
votes
0 answers

Can Spark be configured to listen on multiple network interfaces/ip addresses?

The nodes in my Spark cluster have two network interfaces each, one public and one private. Using the SPARK_MASTER_IP environment variable, I can configure Spark to listen on port 7077 on one or the other ip address. For example: netstat…
Leo
  • 973
  • 6
  • 21
  • 38
1
vote
1 answer

Spark YARN capacity scheduler

I am trying to setup capacity scheduler in Amazon EMR with 2 queues in addition to the default queue. I have successfully created the queues user1 and user2, however when I use spark-submit to run a script on a new queue it will get stuck in…
sjensen85
  • 11
  • 1
1
vote
1 answer

Open a randomized port on a cluster of machines

I'm using Apache Spark, a Java application, to create a cluster of machines. The processes that are launched try to communicate with each other across randomized ports. Is there a way to script the opening of a random port in the cluster? This is a…
activedecay
  • 205
  • 1
  • 2
  • 6
1
vote
0 answers

How to use Cassandra with Spark in a Docker image?

(I hope this question is fit for ServerFault, if not, comment and I'll delete it) I'm trying to create a docker image where Cassandra and Spark would be installed and configured to work together. I never used Spark (and never created a Dockerfile),…
HypeWolf
  • 113
  • 5
1
vote
0 answers

Spark: Pi sample throws NoSuchFileException in cluster mode

I set up a Spark 2.3.1 cluster, however, I have trouble sending a sample SparkPi job to it: Running Spark using the REST application submission protocol. 2018-09-06 13:45:53 INFO RestSubmissionClient:54 - Submitting a request to launch an…
1
vote
2 answers

Equivalent to 'top' command on an EMR cluster?

I have a 3-instance EMR cluster running on AWS, and it's responding very slowly at the moment. When checking the Hadoop dashboard on port 8088 with my browser, I see "Memory used: 203.5GB", and "Memory available: 214GB". I assume the problem is…
1
vote
0 answers

Solr Spark indexing failure due to an error accessing the collection

I am using Solr with Spark in Java to index documents. (Ubuntu 16.0.4) I have set Zookeeper running on port 2181 and my collection test has two shards When I launch my code I have a java.lang.NullPointerException. Here is my code (the class for…
Dilak
  • 11
  • 2
1
vote
0 answers

spark.dynamicAllocation + setting the spark parameters according to ambari cluster

we want to find the values for the following spark parameters according to inputs as memory on datanode machine , CPU CORE on data node machine , numbers of datanode machine etc ,, spark.dynamicAllocation.initialExecutors =…
shalom
  • 451
  • 12
  • 26
1
2