0

I'm setting up a percona-xtradb-cluster-57 (3 nodes) on Ubuntu 16.04. They are to communicate using a private network, 10.254.10.101 through 10.254.10.103.

When I follow the instructions on Percona's site just as written, after bootstrapping the first node I bring up one of the other two normally. However, SHOW STATUS like 'ws_rep%'; results in a cluster size of 1 and a different cluster ID than the bootstrapped node.

I've checked the firewall, turned off the firewall, attempted to connect to each node on ports 3306 and 4567, and double-checked that each machine is seeing its neighbors as a single hop. All of this is as it would be accepted. Ports 4444 and 4568 are also open, though netstat doesn't show them listening. FWIW 4444 and 4568 are also not listening on a working cluster in the same software environment (that one is spread over multiple data centers).

Using SST for replication, with the SST user created on each node. I've also only done it on the bootstrapped node.

Here's the config:

[mysqld]
wsrep_provider=/usr/lib/libgalera_smm.so

wsrep_cluster_name=dbcluster
wsrep_cluster_address=gcomm://10.254.10.101,10.254.10.102,10.254.10.103

wsrep_node_name=pxc1
wsrep_node_address=10.254.10.101

wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sst-user:sst-pass

pxc_strict_mode=ENFORCING

binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2

What diagnostic data would you like to see to help trace this issue? First thing that hits me is a network issue, but I've done all the usual stuff there as described above.

Watching the logs at startup, the two nodes that weren't bootstrapped don't appear to even look for the other nodes.

2017-11-15T15:23:20.255455Z mysqld_safe Logging to '/var/log/mysqld.log'.
2017-11-15T15:23:20.272183Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2017-11-15T15:23:20.279082Z mysqld_safe Skipping wsrep-recover for 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14 pair
2017-11-15T15:23:20.280168Z mysqld_safe Assigning 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14 to wsrep_start_position

2017-11-15T15:23:20.481916Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-11-15T15:23:20.483755Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.19-17-57-log) starting as process 29150 ...
2017-11-15T15:23:20.487158Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=app3-bin' to avoid this problem.
2017-11-15T15:23:20.487910Z 0 [Note] WSREP: Setting wsrep_ready to false
2017-11-15T15:23:20.488055Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2017-11-15T15:23:20.488173Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2017-11-15T15:23:20.491810Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <info@codership.com> loaded successfully.
2017-11-15T15:23:20.491980Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-11-15T15:23:20.492510Z 0 [Note] WSREP: Found saved state: 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14, safe_to_bootsrap: 1
2017-11-15T15:23:20.493950Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.254.10.103; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-11-15T15:23:20.513602Z 0 [Note] WSREP: GCache history reset: 71bf4cd8-ca02-11e7-8b84-630bd10b8205:0 -> 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14
2017-11-15T15:23:20.514428Z 0 [Note] WSREP: Assign initial position for certification: 14, protocol version: -1
2017-11-15T15:23:20.514546Z 0 [Note] WSREP: Preparing to initiate SST/IST
2017-11-15T15:23:20.514663Z 0 [Note] WSREP: Starting replication
2017-11-15T15:23:20.514791Z 0 [Note] WSREP: Setting initial position to 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14
2017-11-15T15:23:20.515118Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-11-15T15:23:20.515346Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-11-15T15:23:20.515543Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown 2017-11-15T15:23:20.515662Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2017-11-15T15:23:20.516044Z 0 [Note] WSREP: GMCast version 0
2017-11-15T15:23:20.516252Z 0 [Note] WSREP: (e997d882, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-11-15T15:23:20.516316Z 0 [Note] WSREP: (e997d882, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-11-15T15:23:20.516739Z 0 [Note] WSREP: EVS version 0
2017-11-15T15:23:20.516889Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer ''
2017-11-15T15:23:20.516986Z 0 [Note] WSREP: start_prim is enabled, turn off pc_recovery
2017-11-15T15:23:20.517230Z 0 [Note] WSREP: Node e997d882 state primary
2017-11-15T15:23:20.517313Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,e997d882,1)
memb {
        e997d882,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2017-11-15T15:23:20.517372Z 0 [Note] WSREP: Save the discovered primary-component to disk
2017-11-15T15:23:20.517555Z 0 [Note] WSREP: gcomm: connected
2017-11-15T15:23:20.517672Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-11-15T15:23:20.517836Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2017-11-15T15:23:20.517987Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: e9982b85-ca18-11e7-bcf3-22cba208212d

2017-11-15T15:23:20.518021Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: e9982b85-ca18-11e7-bcf3-22cba208212d 2017-11-15T15:23:20.518038Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: e9982b85-ca18-11e7-bcf3-22cba208212d from 0 (pxc-cluster-node-1)
2017-11-15T15:23:20.518054Z 0 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 0,
        members    = 1/1 (primary/total),
        act_id     = 14,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 71bf4cd8-ca02-11e7-8b84-630bd10b8205
2017-11-15T15:23:20.518071Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2017-11-15T15:23:20.518084Z 0 [Note] WSREP: Trying to continue unpaused monitor
2017-11-15T15:23:20.518097Z 0 [Note] WSREP: Restored state OPEN -> JOINED (14)
2017-11-15T15:23:20.518141Z 0 [Note] WSREP: Member 0.0 (pxc-cluster-node-1) synced with group.
2017-11-15T15:23:20.517998Z 0 [Note] WSREP: Waiting for SST/IST to complete.
2017-11-15T15:23:20.518154Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 14)
2017-11-15T15:23:20.518501Z 1 [Note] WSREP: New cluster view: global state: 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
2017-11-15T15:23:20.518541Z 1 [Note] WSREP: Setting wsrep_ready to true
2017-11-15T15:23:20.518594Z 0 [Note] WSREP: SST complete, seqno: 14
2017-11-15T15:23:20.520410Z 0 [Note] InnoDB: PUNCH HOLE support available
2017-11-15T15:23:20.520445Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-11-15T15:23:20.520460Z 0 [Note] InnoDB: Uses event mutexes
2017-11-15T15:23:20.520467Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2017-11-15T15:23:20.520474Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2017-11-15T15:23:20.520481Z 0 [Note] InnoDB: Using Linux native AIO
2017-11-15T15:23:20.520824Z 0 [Note] InnoDB: Number of pools: 1
2017-11-15T15:23:20.520978Z 0 [Note] InnoDB: Using CPU crc32 instructions
2017-11-15T15:23:20.522918Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2017-11-15T15:23:20.528901Z 0 [Note] InnoDB: Completed initialization of buffer pool
2017-11-15T15:23:20.531671Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2017-11-15T15:23:20.543934Z 0 [Note] InnoDB: Crash recovery did not find the parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite
2017-11-15T15:23:20.544844Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2017-11-15T15:23:20.560093Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite, size 3932160 bytes
2017-11-15T15:23:20.566433Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2017-11-15T15:23:20.566788Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2017-11-15T15:23:20.588535Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2017-11-15T15:23:20.589701Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2017-11-15T15:23:20.589810Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2017-11-15T15:23:20.590375Z 0 [Note] InnoDB: Waiting for purge to start
2017-11-15T15:23:20.640641Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.19-17 started; log sequence number 2548109
2017-11-15T15:23:20.641178Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2017-11-15T15:23:20.641384Z 0 [Note] Plugin 'FEDERATED' is disabled.
2017-11-15T15:23:20.643176Z 0 [Note] InnoDB: Buffer pool(s) load completed at 171115  9:23:20
2017-11-15T15:23:20.650925Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2017-11-15T15:23:20.650959Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2017-11-15T15:23:20.651471Z 0 [Warning] CA certificate ca.pem is self signed.
2017-11-15T15:23:20.651529Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2017-11-15T15:23:20.651638Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2017-11-15T15:23:20.651688Z 0 [Note] IPv6 is available.
2017-11-15T15:23:20.651706Z 0 [Note]   - '::' resolves to '::';
2017-11-15T15:23:20.651733Z 0 [Note] Server socket created on IP: '::'.
2017-11-15T15:23:20.661454Z 0 [Note] Event Scheduler: Loaded 0 events
2017-11-15T15:23:20.662904Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.19-17-57-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22
2017-11-15T15:23:20.662945Z 0 [Note] Executing 'SELECT * FROM INFORMATION_SCHEMA.TABLES;' to get a list of tables using the deprecated partition engine. You may use the startup option '--disable-partition-engine-check' to skip this check.
2017-11-15T15:23:20.663009Z 0 [Note] Beginning of list of non-natively partitioned tables
2017-11-15T15:23:20.663172Z 1 [Note] WSREP: Initialized wsrep sidno 2
2017-11-15T15:23:20.663199Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-11-15T15:23:20.663233Z 1 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-11-15T15:23:20.663248Z 1 [Note] WSREP: Assign initial position for certification: 14, protocol version: 3
2017-11-15T15:23:20.663291Z 0 [Note] WSREP: Service thread queue flushed.
2017-11-15T15:23:20.663350Z 1 [Note] WSREP: GCache history reset: 71bf4cd8-ca02-11e7-8b84-630bd10b8205:0 -> 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14
2017-11-15T15:23:20.663818Z 1 [Note] WSREP: Synchronized with group, ready for connections
2017-11-15T15:23:20.663839Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-11-15T15:23:20.679843Z 0 [Note] End of list of non-natively partitioned tables

1 Answers1

1

I think your problem might be:

2017-11-15T15:23:20.516889Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer ''

Assuming this is your second node and your first node is configured with:

wsrep_cluster_name=dbcluster

They will not see each other, make sure they have the same cluster name.

  • I thought I got it copy/parted correctly between the servers, but I certainly could have missed that. I've implemented MariaDB instead, and have it working. Thank you for the suggestion! – user396356 Nov 17 '17 at 13:46