Highest Voted 'apache-spark' Questions - Server Fault Stack Exchange

1

vote

0 answers

ambari cluster + when need to set Block replication to 1

We get the following in Spark logs: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage DatanodeInfoWithStorage\ The current…

hadoop hdfs apache-spark

asked Jan 31 '18 at 09:20

shalom

451
12
26

1

vote

1 answer

How is the number of RDD partitions decided in Apache Spark?

Question How is the number of partitions decided by Spark? Do I need to specify the number of available CPU cores somewhere explicitly so that the number of partitions will be the same (such as numPartition arg of parallelize method, but then need…

partition apache-spark

asked Sep 26 '16 at 22:42

mon

225
3
9

1

vote

0 answers

Throttle Spark Cassandra Connector Reads on a Production Cluster

We're currently running a 24 Node Cassandra Cluster in Production that holds 30Tb of data and handles an average live load of 100k Requests Per Min 24/7. We support multiple partners. One of our partners are leaving our Org, so we have to filter…

cassandra apache-spark

asked May 05 '21 at 09:15

Mano

11
1

1

vote

0 answers

Fastest way to import files in Spark?

I’m playing around with Spark 3.0.1 and I’m really impressed by the performance with Spark SQL on GB of data. I’m trying to understand what’s the best way to import multiple JSON files in the Spark dataframe before running the analysis…

storage amazon-s3 amazon-elb apache-spark amazon-emr

asked Dec 31 '20 at 15:29

int 2Eh

183
1
2
6

1

vote

1 answer

Zstd parquet decompression

I have parquet file compressed by zstd. It is possible to decompress it somehow? I tried to use zstd command, but without any luck: [x@xyz tmp]# zstd -d part-00016-303a375a-e443-4f86-a59e-b5d82d15bd26.c000.zstd.parquet -o test.parquet zstd:…

linux apache-spark

asked Sep 22 '20 at 14:00

Jacfal

21
3

1

vote

0 answers

How to install cosmosdb spark connector in databricks init script

I'm tried to install cosmosdb spark connector (https://docs.microsoft.com/en-us/azure/cosmos-db/spark-connector) in azure databricks on a cluster in init script, but had errors and non working cluster (one of the uber libraries has different…

azure apache-spark

asked Mar 15 '20 at 21:46

Роман Коптев

185
1
8

0

votes

1 answer

is it possible mix different RHEL OS version in hadoop cluster?

we are using the following HDP cluster with ambari , list of nodes and their RHEL version 3 masters machines ( with namenode & resource manager ) , installed on RHEL 7.2 312 DATA-NODES machines , installed on RHEL 7.2 5 kafka machines , installed…

redhat rhel7 hadoop hdfs apache-spark

asked Nov 20 '19 at 19:48

shalom

451
12
26

0

votes

0 answers

Any benefits of ZFS over EXT4 for data stream processing on top of HDFS?

I'm working on a data stream processing project in which i will be using Apache Flink and Apache Spark and I want to use HDFS for storage. The development and testing will be done on a single node cluster with multiple physical disks. I have already…

zfs ext4 hadoop hdfs apache-spark

asked Oct 08 '19 at 15:36

HUSMEN

1
2

0

votes

1 answer

Unable to run Spark Cluster on Google DataProc

I am running a 6 node spark cluster on Google Data Proc and within few minutes of launching spark, and performing basic operations, I get the below error OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000fbe00000, 24641536, 0)…

google-cloud-platform google-compute-engine apache-spark

asked Oct 31 '18 at 03:12

Tushar Mehta

103
3

0

votes

1 answer

how to install hadoop2.4.1 in windows with spark 2.0.0

i want to setup a cluster using hadoop in yarn mode..i want to use spark API for map-reduce and will use spark submit to deploy my applications..i want to work on cluster..can anyone help me how to install HADOOP in cluster using windows

windows-7 cluster hadoop hdfs apache-spark

asked Mar 13 '17 at 12:26

Sadim Nadeem

1

0

votes

0 answers

Apache Spark Web UI on kubernetes not working as expected

hi im having a problem i'am deploying Apache spark helm chart on kubernetes bitnami chart : helm repo add bitnami https://charts.bitnami.com/bitnami normally the apache spark webui is on port 8080 when i access the webUI here is what i get : what…

kubernetes helm apache-spark

asked Sep 15 '22 at 16:56

ossama assaghir

19
3

0

votes

0 answers

How to read files from a directory having name "/" in S3 bucket?

Code: val df = spark.read.csv("s3a://sample_bucket//csvFiles/file.csv"); Error: 22/06/23 20:02:57 WARN impl.MetricsConfig: Cannot locate configuration: tried…

amazon-s3 apache-spark

asked Jun 23 '22 at 14:49

Slow_learner

1
1

0

votes

0 answers

Suggestion for Non Analytical Distributed Processing Frameworks

Can someone please suggest a tool, framework or a service to perform the below task faster. Input : The input to the service is a CSV file which consists of an identifier and several image columns with over a million rows. Objective: To check if any…

google-cloud-platform distributed-computing big-data apache-spark

asked May 31 '22 at 04:15

Kishan M Mohan

1
1

0

votes

1 answer

Spark-Cassandra-Connector Issue Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/rdd/reader/RowReaderFactory

What is going wrong with the Spark Cassandra Connector could you please help to solve this? Scala File: import com.datastax.spark.connector._ import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} object…

cluster cassandra datastax-enterprise big-data apache-spark

asked Dec 28 '20 at 18:22

Subhrangshu Adhikary

109
1

Questions tagged [apache-spark]