When adding new Cassandra nodes to the cluster we also start up the DataStax agent. After some time the agent is shown as not being connected anymore. Whenever we restart the agent the following error is logged:
ERROR [Initialization] 2015-12-15 10:42:25,309 Can't connect to Cassandra, retrying soon.
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /192.168.10.1:9042 (com.datastax.driver.core.TransportException: [/192.168.10.1:9042] Cannot connect))
The IP address 192.168.10.1 is the broadcast_address
of the Cassandra node. The rpc_address
is different which is why the agent is unable to connect to the node. This wrong IP is being sent from OpsCenter to the agent as shown in the logfile (reformatted for better readability):
INFO [StompConnection receiver] 2015-12-15 10:42:23,492 Got new config from OpsCenter: {
:cassandra_port 9042,
:rollups300_ttl 7776000,
:destinations [],
:restore_req_update_period 1,
:cassandra_rpc_interface "192.168.10.1",
:rollups60_ttl 7776000,
:thrift_port 9160,
:ec2_metadata_api_host "169.254.169.254",
:metrics_enabled 1,
:backup_staging_dir "",
:rollups7200_ttl 7776000,
:ssl_keystore nil,
:metrics_ignored_column_families "",
:cassandra_log_location "/var/log/cassandra/system.log",
:config_md5 "49a3234ff4e1eca80f3b2c2027ae5d9c",
:jmx_port 7199,
:provisioning 0,
:use_ssl 1,
:max_pending_repairs 5,
:rollups86400_ttl -1,
:api_port "61621",
:storage_keyspace "OpsCenter",
:hosts ["192.168.10.1"],
:metrics_ignored_solr_cores "",
:metrics_ignored_keyspaces "system, system_traces, system_auth, dse_auth, OpsCenter",
:rollup_subscriptions [],
:cassandra_install_location ""}
After restarting OpsCenter and then the agent the correct IP is sent to the agent and no errors are logged anymore.
While the node is joining the system.peers table does not contain an entry for the new node. Afterwards it contains the correct addresses.
How can we make new agents use the correct address (the rpc_address) other than restarting OpsCenter every time new nodes are added?
Update: Setting hosts in address.yaml doesn't work
Just tried setting hosts: ["192.168.8.1"]
(which is the rpc_address of the node) in /var/lib/datastax-agent/conf/address.yaml
. The behavior is exactly the same. It seems this host is overwritten by what the OpsCenter provides:
INFO [main] 2015-12-22 08:55:15,207 Loading conf files: /var/lib/datastax-agent/conf/address.yaml
INFO [main] 2015-12-22 08:55:15,258 Java vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.8.0_45
INFO [main] 2015-12-22 08:55:15,258 DataStax Agent version: 5.1.3
INFO [main] 2015-12-22 08:55:15,282 Default config values: {... :agent_rpc_broadcast_address "192.168.10.1", ... :hosts ["192.168.8.1"]}
...
INFO [StompConnection receiver] 2015-12-22 08:55:21,015 Got new config from OpsCenter: {... :cassandra_rpc_interface "192.168.10.1", ... :hosts ["192.168.10.1"] ...}
...
ERROR [Initialization] 2015-12-22 08:55:22,926 Can't connect to Cassandra, retrying soon.
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /192.168.10.1:9042 (com.datastax.driver.core.TransportException: [/192.168.10.1:9042] Cannot connect))
...
WARN [Initialization] 2015-12-22 08:55:32,652 Resetting cluster because {:hosts ["192.168.8.1"]} changed to {:hosts ["192.168.10.1"], :local_interface "192.168.10.1"}