Questions tagged [apache-spark]
29 questions
6
votes
2 answers
How can I run Spark on a cluster using Slurm?
I have written a program example.jar which uses a spark context. How can I run this on a cluster which uses Slurm? This is related to https://stackoverflow.com/questions/29308202/running-spark-on-top-of-slurm but the answers are not very detailed…
mxmlnkn
- 395
- 3
- 11
5
votes
0 answers
Spark Error: Failed to Send RPC to Datanode
We have quite few issues with our Spark Thrift server. It is a new Ambari cluster and no Spark jobs are running now.
From the log we can see an error message:
Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149
Please advice why this…
shalom
- 451
- 12
- 26
5
votes
0 answers
How many disks for spark_local_dirs?
I'm looking for a solution to improve my Spark cluster performances, I have read from http://spark.apache.org/docs/latest/hardware-provisioning.html:
We recommend having 4-8 disks per node
, I have tried both with one and two disks but I have…
LucaGuerra
- 151
- 1
4
votes
1 answer
How to configure Spark client running in a Docker container for two-way communication with a remote Spark cluster?
spark-submit seems to require two-way communication with a remote Spark cluster in order to run jobs.
This is easy to configure between machines (10.x.x.x to 10.x.x.x and back) but becomes confusing when Docker adds an extra layer of networking…
Leo
- 973
- 6
- 21
- 38
3
votes
0 answers
Spark: Pi sample throws SocketTimeoutException in cluster k8s mode
I set up a Spark 2.3.1 cluster on kubernetes, however, I have trouble sending a sample SparkPi job to it:
The submit script I'm using:
bin/spark-submit \
--master k8s://https://10.0.15.7:7077 \
--deploy-mode cluster \
--name spark-pi \
…
Dzmitry Lazerka
- 151
- 9
3
votes
1 answer
S3 cross-account bucket permissions
Similar to what is described in this article[0], the company I work for uses a bastion AWS account to store IAM users and other AWS accounts to separate different running environments (prod, dev, etc.). The reason this is important is that we have…
c4urself
- 5,270
- 3
- 25
- 39
3
votes
0 answers
Memory problems with Spark application on AWS EMR
I've been trying to get to the bottom of a memory issues for some time now and I simply can't fathom out what the problem is. Any help is greatly appreciated.
The error is:
[![OpenJDK 64-Bit Server VM warning: INFO:…
null
- 139
- 2
- 10
2
votes
0 answers
Can Spark be configured to listen on multiple network interfaces/ip addresses?
The nodes in my Spark cluster have two network interfaces each, one public and one private. Using the SPARK_MASTER_IP environment variable, I can configure Spark to listen on port 7077 on one or the other ip address.
For example:
netstat…
Leo
- 973
- 6
- 21
- 38
1
vote
1 answer
Spark YARN capacity scheduler
I am trying to setup capacity scheduler in Amazon EMR with 2 queues in addition to the default queue. I have successfully created the queues user1 and user2, however when I use spark-submit to run a script on a new queue it will get stuck in…
sjensen85
- 11
- 1
1
vote
1 answer
Open a randomized port on a cluster of machines
I'm using Apache Spark, a Java application, to create a cluster of machines. The processes that are launched try to communicate with each other across randomized ports. Is there a way to script the opening of a random port in the cluster?
This is a…
activedecay
- 205
- 1
- 2
- 6
1
vote
0 answers
How to use Cassandra with Spark in a Docker image?
(I hope this question is fit for ServerFault, if not, comment and I'll delete it)
I'm trying to create a docker image where Cassandra and Spark would be installed and configured to work together.
I never used Spark (and never created a Dockerfile),…
HypeWolf
- 113
- 5
1
vote
0 answers
Spark: Pi sample throws NoSuchFileException in cluster mode
I set up a Spark 2.3.1 cluster, however, I have trouble sending a sample SparkPi job to it:
Running Spark using the REST application submission protocol.
2018-09-06 13:45:53 INFO RestSubmissionClient:54 - Submitting a request to launch an…
Dzmitry Lazerka
- 151
- 9
1
vote
2 answers
Equivalent to 'top' command on an EMR cluster?
I have a 3-instance EMR cluster running on AWS, and it's responding very slowly at the moment.
When checking the Hadoop dashboard on port 8088 with my browser, I see "Memory used: 203.5GB", and "Memory available: 214GB". I assume the problem is…
Alexander Engelhardt
- 113
- 4
1
vote
0 answers
Solr Spark indexing failure due to an error accessing the collection
I am using Solr with Spark in Java to index documents. (Ubuntu 16.0.4)
I have set Zookeeper running on port 2181 and my collection test has two shards
When I launch my code I have a java.lang.NullPointerException.
Here is my code (the class for…
Dilak
- 11
- 2
1
vote
0 answers
spark.dynamicAllocation + setting the spark parameters according to ambari cluster
we want to find the values for the following spark parameters according to inputs as memory on datanode machine , CPU CORE on data node machine , numbers of datanode machine etc ,,
spark.dynamicAllocation.initialExecutors =…
shalom
- 451
- 12
- 26