I have an HBase cluster that is working, and I'm attempting to add some new servers to the cluster, but "SocketException: Invalid argument" and "FailedServerException: This server is in the failed servers list" errors keep getting generated in the logs.
2014-07-02 22:28:01,140 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.SocketException: Invalid argument
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:534)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:193)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:392)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:438)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1141)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:988)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
at com.sun.proxy.$Proxy10.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:141)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2040)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2086)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:748)
at java.lang.Thread.run(Thread.java:701)
2014-07-02 22:28:31,764 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: <MY_MASTER_SERVER>/<MY_MASTER_NAME>:<MY_MASTER_PORT>
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:427)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1141)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:988)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
at com.sun.proxy.$Proxy10.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:141)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2040)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2086)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:748)
at java.lang.Thread.run(Thread.java:701)
So far I can't find any differences between the old and new servers:
- both running Ubuntu 12.04 with all the latest updates and Cloudera's CDH4 for HBase
- neither have /etc/hosts have entries for master HBase (although I tried adding one on new servers, but still having same issue)
- firewalls should be configured the same without any local network restrictions (NOTE: on new servers, I can telnet to port 60000, my HBase Master's port, without any errors)
While debugging this, I saw a mention online about IPv6 configuration possibly causing issue, but as far as I know, both old and new servers have whatever default configurations Ubuntu uses for this.
Any ideas on how I can debug further and/or what the issue may be?