6

I'm trying to create an automatic deployment of Galera Cluster with MariaDB, running inside Docker containers on CoreOS.

Software used: - MariaDB official Docker image, version 10.1.10 - CoreOS 899.5.0, with Docker 1.9.1

Everything is running on 2 separate VMs: 10.2.0.4 and 10.2.0.5

I can start the first node (10.2.0.4) successfully and bootstrap the cluster.

However, when I start the second node I get a lot of errors with the replication of the mysql.time_zone_transition_type and mysql.time_zone_name table. After that, the mysqld daemon does NOT crash, so my Docker container keeps running (it runs for minutes without problems), but it does not appear to have joined the cluster (querying the status on the first node shows just 1 node has joined) and it does not accept any connection (trying to connect to that node fails). Weirdly enough, however, if I re-start the Docker container (keeping the data folder), it then joins the node and works flawlessly.

Here's the MySQL configuration file (added to /etc/mysql/conf.d in the Docker container):

# this is read by the standalone daemon and embedded servers
[server]

# this is only for the mysqld standalone daemon
[mysqld]

#
# * Galera-related settings
#
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider="/usr/lib/galera/libgalera_smm.so"
wsrep_cluster_address="gcomm://10.2.0.4,10.2.0.5"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
wsrep-sst-method=rsync

#
# Allow server to accept connections on all interfaces.
#
bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0

# this is only for embedded server
[embedded]

# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]

# This group is only read by MariaDB-10.1 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.1]

I start the first node with:

$ docker rm -f some-mariadb
$ rm -rf /mnt/resource/data/*
# Note: we need to pass the IP of the VM or mysqld will get the IP from the Docker container
$ docker run \
  --name some-mariadb \
  -v /mnt/resource/mysql.conf.d:/etc/mysql/conf.d \
  -v /mnt/resource/data:/var/lib/mysql \
  -e MYSQL_ROOT_PASSWORD=my-secret-pw \
  -d \
  -p 3306:3306 \
  -p 4567:4567/udp \
  -p 4567-4568:4567-4568 \
  -p 4444:4444 \
  mariadb:10.1 \
  --wsrep-new-cluster \
  --wsrep_node_address=10.2.0.4

And the second one with:

$ rm -rf /mnt/resource/data/*
$ docker rm -f some-mariadb
# Create a "/var/lib/mysql/mysql" folder so the Docker container won't initialize the db again (won't re-execute mysql_install_db)
$ mkdir -p /mnt/resource/data/mysql
$ docker run \
  --name some-mariadb \
  -v /mnt/resource/mysql.conf.d:/etc/mysql/conf.d \
  -v /mnt/resource/data:/var/lib/mysql \
  -d \
  -p 3306:3306 \
  -p 4567:4567/udp \
  -p 4567-4568:4567-4568 \
  -p 4444:4444 \
  mariadb:10.1 \
  --wsrep_node_address=10.2.0.5

My replication errors all look like:

2016-01-23 23:57:52 140131133560576 [ERROR] Slave SQL: Error 'Column 'Time_zone_id' cannot be null' on query. Default database: 'mysql'. Query: 'INSERT INTO time_zone_name (Name, Time_zone_id) VALUES ('Etc/GMT', @time_zone_id)', Internal MariaDB error code: 1048
2016-01-23 23:57:52 140131133560576 [Warning] WSREP: RBR event 1 Query apply warning: 1, 1536
2016-01-23 23:57:52 140131133560576 [Warning] WSREP: Ignoring error for TO isolated action: source: 09357a0e-c22d-11e5-963a-0a6f9b6b61c4 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 5 trx_id: -1 seqnos (l: 1147, g: 1536, s: 1535, d: 1535, ts: 73713003335315)

The full log can be found at this link (it's 4.2MB! too large for the post)

Again, please note that once I re-start the container in the second node (preserving the data), then replication works - and works well! But this "weird" startup process isn't normal, and I can't rely on it (I need to script the entire setup by creating fleet.d units, later)

ItalyPaleAle
  • 445
  • 5
  • 18

1 Answers1

3

After days of fighting with this, I was finally able to get this to work.

The key issue is that the default Docker image (probably mimicking what the MySQL image does) adds timezone data in the database, and for whatever reason (maybe because it uses MyISAM tables?) that causes huge problems with this setup.

Solution: when launching the first node, pass the MYSQL_INITDB_SKIP_TZINFO=yes environmental variable to the Docker container. So, the command to launch the first Docker container is:

docker run \
  --name some-mariadb \
  -v /mnt/resource/mysql.conf.d:/etc/mysql/conf.d \
  -v /mnt/resource/data:/var/lib/mysql \
  -e MYSQL_INITDB_SKIP_TZINFO=yes \
  -e MYSQL_ROOT_PASSWORD=my-secret-pw \
  -d \
  -p 3306:3306 \
  -p 4567:4567/udp \
  -p 4567-4568:4567-4568 \
  -p 4444:4444 \
  mariadb:10.1 \
  --wsrep-new-cluster \
  --wsrep_node_address=10.2.0.4
ItalyPaleAle
  • 445
  • 5
  • 18