Spark Error: Failed to Send RPC to Datanode

Question

We have quite few issues with our Spark Thrift server. It is a new Ambari cluster and no Spark jobs are running now.

From the log we can see an error message:

Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149

Please advice why this happens, and what is the solution for this?

Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149: java.nio.channels.ClosedChannelException
more spark-hive-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-master03.sys67.com.out

Spark Command: /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.0.3-8 -cp /usr/hdp/current/spark2-thriftserver/conf/:/usr/hdp/current/spark2-thriftserver/jars/*:/usr/hdp/c
urrent/hadoop-client/conf/ -Xmx10000m org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=15g --properties-file /usr/hdp/current/spark2-thriftserver/conf/spark-th
rift-sparkconf.conf --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --executor-cores 7 spark-internal
========================================
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
18/02/07 17:55:21 ERROR TransportClient: Failed to send RPC 9053901149358924945 to /12.87.2.64:50149: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/02/07 17:55:21 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(2,0,Map()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 9053901149358924945 to /12.87.2.64:50149: java.nio.channels.ClosedChannelException
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249)
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:514)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:488)
        at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
        at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:438)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/02/07 17:55:21 ERROR SparkContext: Error initializing SparkContext.

We've also tried to capture some good points from the following link

https://thebipalace.com/2017/08/23/spark-error-failed-to-send-rpc-to-datanode/

but since this is a new Ambari cluster and we don't think this article fit for this particular issue.

I am having the same error running spark with YARN. Does anyone have any ideas about this? I am doing some wrangling on a fairly large table (130MM rows x ~1k columns) and when I try to write the new version of the table with df.SaveAsTable() I get this error. — seth127, May 08 '18 at 13:53
In your question the first line of the error message `Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149` seems not to fit to the lines of the stack trace `Failed to send RPC 9053901149358924945 to /12.87.2.64:50149`. Has this changed during writing the post for reason? Or is this the real hostname? If so, I would start with debugging DNS, name resolution, etc. since it looks incorrect for me. The resolution between IP<>hostname should work on all involved nodes correctly. — U880D, May 15 '18 at 07:26

Spark Error: Failed to Send RPC to Datanode

0 Answers0