1

I'm trying to add nodes to a Cloudera cluster. When the agent starts I get a python stacktrace saying it can't heartbeat to master-host:7182, however I can connect to that port just fine.

The stacktrace is from Python and ends saying the connection timed out.

nc -z 1 -w master-host 7182 returns "connection successful"

Firewalls are off, SELinux is in Permissive.

Each box has 2 IPs, one in the 4 space and one in the 8 space. The DNS resolves the 8 address and the hosts file resolves the 4 address.

EDIT: To add some more info, based on this post:

  • OS Versions are the same, agent/manager versions are the same
  • I can connect from the CM host to the 4 address, port 9000. The 4 address is the one that shows up in the Host page on Cloudera Manager
  • The large ping command fails on the 4 address: ping -c 3 -s 1800 4-address, MTU for this interface is set to 9000.
  • The large ping command passes on the 8 address, MTU is set to 1500.
shearn89
  • 3,143
  • 2
  • 14
  • 39
  • I fixed it with synchronization of clocks. Here's a step by step guide to do it: [http://www.yourtechchick.com/hadoop/failed-receive-heartbeat-agent-cloudera-hadoop/](http://www.yourtechchick.com/hadoop/failed-receive-heartbeat-agent-cloudera-hadoop/) – gags Nov 05 '16 at 18:11

1 Answers1

1

Turns out the MTU appeared to be the issue - the infrastructure we're using wasn't supporting jumbo frames end-to-end (in this case, Cisco c240m4s with fibre interconnects needed the QoS settings updating via UCS).

shearn89
  • 3,143
  • 2
  • 14
  • 39