0

What is going wrong with the Spark Cassandra Connector could you please help to solve this?

Scala File:

import com.datastax.spark.connector._
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}


object SparkTest {
  def main(args: Array[String]) {
    val spark = SparkSession.builder.appName("SparkTest").getOrCreate()
    println("Hello World2")

    val conf = new SparkConf().set("spark.cassandra.connection.host","localhost")
    val sc = new SparkContext(conf)
    val rdd1 = sc.cassandraTable("spark_parallalism_test", "test_table1")
    println(rdd1.first)
    sc.stop()
    spark.stop()
  }
}

SBT file looks like:

name := "SparkTest"

version := "1.0"

scalaVersion := "2.12.10"


libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.1"

libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.12" % "3.0.0"

Commands to execute:

sbt package

spark-submit --class "SparkTest" --master local[4] target/scala-2.12/sparktest_2.12-1.0.jar

Error Log:

20/12/28 23:41:57 WARN Utils: Your hostname, subhrangshu-Lenovo-V110-15ISK resolves to a loopback address: 127.0.1.1; using 192.168.43.14 instead (on interface wlp2s0)
20/12/28 23:41:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/12/28 23:41:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/rdd/reader/RowReaderFactory
        at SparkTest.main(SparkTest.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.reader.RowReaderFactory
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 13 more
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

What is my mistake in this case?

1 Answers1

0

This question is more for StackOverflow, but I'll answer here.

The main reason is that sbt package just compiles your code, but don't put dependencies into resulting Jar file. either you need to start spark-submit with --packages com.datastax.spark:spark-cassandra-connector_2.12:3.0.0, or you can do sbt assembly, but in this case spark-sql needs to be declared as provided.

See documentation for Spark Cassandra Connector for more details.

Alex Ott
  • 316
  • 1
  • 5