Galera Node won't start after recovery

Question

I have a Galera "Cluster" with one working node. I added a second node and start the synchronization via rsync. The State transfer succeeded but when starting the database it tells me that the innodb shall be currupt. But as far as I understand Galera this should be the exact same data then on node1.

Here is the Log output:

    Feb 19 17:05:53 node2.example.com rsyncd[27285]: connect from node1.example.com (1.2.3.4)
Feb 19 17:05:53 node2.example.com rsyncd[27285]: rsync to rsync_sst/ from node1.example.com (1.2.3.4)
Feb 19 17:05:53 node2.example.com rsyncd[27285]: receiving file list
Feb 19 17:05:53 node2.example.com mysqld[38851]: 2017-02-19 17:05:53 140273959036672 [Note] WSREP: 1.0 (node1): State transfer to 0.0 (node2) complete.
Feb 19 17:05:53 node2.example.com mysqld[38851]: 2017-02-19 17:05:53 140273959036672 [Note] WSREP: Member 1.0 (node1) synced with group.
Feb 19 17:05:53 node2.example.com mysqld[38851]: WSREP_SST: [INFO] Joiner cleanup. rsync PID: 38894 (20170219 17:05:53.972)
Feb 19 17:05:54 node2.example.com mysqld[38851]: WSREP_SST: [INFO] Joiner cleanup done. (20170219 17:05:54.481)
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 140274267547840 [Note] WSREP: SST complete, seqno: 2744639
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 7f9425e368c0 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future     releases. Please use READ COMMITTED transaction isolation level instead, see http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html.
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 140274267547840 [Note] InnoDB: Using mutexes to ref count buffer pool pages
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 140274267547840 [Note] InnoDB: The InnoDB memory heap is disabled
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 140274267547840 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 140274267547840 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 140274267547840 [Note] InnoDB: Compressed tables use zlib 1.2.7
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 140274267547840 [Note] InnoDB: Using Linux native AIO
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 140274267547840 [Note] InnoDB: Using SSE crc32 instructions
Feb 19 17:05:54 node2.example.com mysqld[38851]: 2017-02-19 17:05:54 140274267547840 [Note] InnoDB: Initializing buffer pool, size = 16.0G
Feb 19 17:05:55 node2.example.com mysqld[38851]: 2017-02-19 17:05:55 140274267547840 [Note] InnoDB: Completed initialization of buffer pool
Feb 19 17:05:55 node2.example.com mysqld[38851]: 2017-02-19 17:05:55 140274267547840 [Note] InnoDB: Highest supported file format is Barracuda.
Feb 19 17:05:56 node2.example.com mysqld[38851]: 2017-02-19 17:05:56 140274267547840 [Note] InnoDB: The log sequence numbers 1017291030265 and 1017291030265 in ibdata files do not match     the log sequence number 1017291030429 in the ib_logfiles!
Feb 19 17:05:56 node2.example.com mysqld[38851]: 2017-02-19 17:05:56 140274267547840 [Note] InnoDB: Database was not shutdown normally!
Feb 19 17:05:56 node2.example.com mysqld[38851]: 2017-02-19 17:05:56 140274267547840 [Note] InnoDB: Starting crash recovery.
Feb 19 17:05:56 node2.example.com mysqld[38851]: 2017-02-19 17:05:56 140274267547840 [Note] InnoDB: Reading tablespace information from the .ibd files...
Feb 19 17:05:56 node2.example.com mysqld[38851]: 2017-02-19 17:05:56 140274267547840 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace mysql/    innodb_index_stats uses space ID: 2 at filepath: ./mysql/innodb_index_stats.ibd. Cannot open tablespace tde/SOURCES which uses space ID: 2 at filepath: ./tde/SOURCES.ibd
Feb 19 17:05:56 node2.example.com mysqld[38851]: 2017-02-19 17:05:56 7f9425e368c0  InnoDB: Operating system error number 2 in a file operation.
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: The error means the system cannot find the path specified.
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: If you are installing InnoDB, remember that you must create
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: directories yourself, InnoDB does not create them.
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: Error: could not open single-table tablespace file ./tde/SOURCES.ibd
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: We do not continue the crash recovery, because the table may become
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: To fix the problem and start mysqld:
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: 1) If there is a permission problem in the file and mysqld cannot
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: open the file, you should modify the permissions.
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: 2) If the table is not needed, or you can restore it from a backup,
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: crash recovery and ignore that table.
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: 3) If the file system or the disk is broken, and you cannot remove
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
Feb 19 17:05:56 node2.example.com mysqld[38851]: InnoDB: and force InnoDB to continue crash recovery here.
Feb 19 17:05:56 node2.example.com mysqld[38851]: 170219 17:05:56 [ERROR] mysqld got signal 6 ;
Feb 19 17:05:56 node2.example.com mysqld[38851]: This could be because you hit a bug. It is also possible that this binary
Feb 19 17:05:56 node2.example.com mysqld[38851]: or one of the libraries it was linked against is corrupt, improperly built,
Feb 19 17:05:56 node2.example.com mysqld[38851]: or misconfigured. This error can also be caused by malfunctioning hardware.
Feb 19 17:05:56 node2.example.com mysqld[38851]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Feb 19 17:05:56 node2.example.com mysqld[38851]: We will try our best to scrape up some info that will hopefully help
Feb 19 17:05:56 node2.example.com mysqld[38851]: diagnose the problem, but since we have already crashed,
Feb 19 17:05:56 node2.example.com mysqld[38851]: something is definitely wrong and this may fail.
Feb 19 17:05:56 node2.example.com mysqld[38851]: Server version: 10.1.21-MariaDB
Feb 19 17:05:56 node2.example.com mysqld[38851]: key_buffer_size=134217728
Feb 19 17:05:56 node2.example.com mysqld[38851]: read_buffer_size=131072
Feb 19 17:05:56 node2.example.com mysqld[38851]: max_used_connections=0
Feb 19 17:05:56 node2.example.com mysqld[38851]: max_threads=153
Feb 19 17:05:56 node2.example.com mysqld[38851]: thread_count=2
Feb 19 17:05:56 node2.example.com mysqld[38851]: It is possible that mysqld could use up to
Feb 19 17:05:56 node2.example.com mysqld[38851]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467121 K  bytes of memory
Feb 19 17:05:56 node2.example.com mysqld[38851]: Hope that's ok; if not, decrease some variables in the equation.
Feb 19 17:05:56 node2.example.com mysqld[38851]: Thread pointer: 0x0
Feb 19 17:05:56 node2.example.com mysqld[38851]: Attempting backtrace. You can use the following information to find out
Feb 19 17:05:56 node2.example.com mysqld[38851]: where mysqld died. If you see no messages after this, something went
Feb 19 17:05:56 node2.example.com mysqld[38851]: terribly wrong...
Feb 19 17:05:56 node2.example.com mysqld[38851]: stack_bottom = 0x0 thread_stack 0x48400
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f94269089ce]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7f942642e355]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /lib64/libpthread.so.0(+0xf100)[0x7f9425a48100]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /lib64/libc.so.6(gsignal+0x37)[0x7f9423da35f7]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /lib64/libc.so.6(abort+0x148)[0x7f9423da4ce8]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(+0x8ef240)[0x7f9426768240]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(+0x789948)[0x7f9426602948]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(+0x819b59)[0x7f9426692b59]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(+0x7393f4)[0x7f94265b23f4]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x64)[0x7f9426430714]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(+0x445625)[0x7f94262be625]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(_Z11plugin_initPiPPci+0x9ea)[0x7f94262bff3a]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(+0x39ed18)[0x7f9426217d18]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x9f5)[0x7f942621be95]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9423d8fb15]
Feb 19 17:05:56 node2.example.com mysqld[38851]: /usr/sbin/mysqld(+0x3967ed)[0x7f942620f7ed]
Feb 19 17:05:56 node2.example.com mysqld[38851]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Feb 19 17:05:56 node2.example.com mysqld[38851]: information that should help you find out what is causing the crash.

What I also see: on both servers there is no ./tde/SOURCES.idb file.

Have you tried proceeding with the 3 suggestions you're quoting in your logs? — SYN, Feb 19 '17 at 17:17
AFAIR, with Galera, 3 nodes is the recommended minimum. Now if you can't recover your table from node1, you may have more luck on node2. — SYN, Feb 19 '17 at 17:19
Node 1 is up and running - and it can be restartet. I can also query the SOURCES table. Node 2 was a blank Galera node where I now enabled the sync. So I don't understand, why node 2 could even fail. If there is no solution I'll try dumping the table from Node 1 and reimport it. — PascalTurbo, Feb 19 '17 at 18:01
The very first suggestion tells about a permission issue. Are you sure permissions are properly set, on your new instance? — SYN, Feb 19 '17 at 18:42
Can't see any wrong permissions. User "mysql" is allowed to write to the data directory and all of its subfolders. — PascalTurbo, Feb 20 '17 at 07:34

score 1 · Answer 1 · answered Feb 20 '17 at 02:53

Sorry but I am confused. you have 3 nodes say node-1, node-2, node-3.

As per your information node-1 is up and running and you are trying to add node-2. (Log that you have shared is for node-1 ? Is node-1 too facing issue ? )

Also, which SST method did you used for xfer.

May be if possible you can also share complete log file since this is confusing and not expected to happen.

Feb 19 17:05:56 node1.example.com mysqld[38851]: 2017-02-19 17:05:56 140274267547840 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace mysql/innodb_index_stats uses space ID: 2 at filepath: ./mysql/innodb_index_stats.ibd. Cannot open tablespace tde/SOURCES which uses space ID: 2 at filepath: ./tde/SOURCES.ibd

Given space-id was allocated for some other table that was opened by MySQL and post SST same space-id is allocated to some other table.

I changed my question a bit: - The cluster contains one working node and now I want to add a second node (we are in an Migration Scenario) - SST: rsync - I've added the whole log output to my question — PascalTurbo, Feb 20 '17 at 07:35

score 1 · Answer 2 · answered Feb 21 '17 at 16:36

You should be clearer in your question. As per your question here, but not stated above, you have 1 node which appears to be working and are in the process of adding a second node. You are getting this error on the second node.

This jumped out at me when I first skimmed over your error log and with the error appears that you're starting off with a corrupted database on node 2.

Feb 19 17:05:56 node2.example.com mysqld[38851]: 2017-02-19 17:05:56 140274267547840 [Note] InnoDB: Database was not shutdown normally!

Are all your tables InnoDB with Primary keys?

I hope this is a testing setup for you and not production. As such, I suggest wiping the data directory on node 2 and trying to start the service again. This will ensure a clean start when MySQL starts back up and is what Xtrabackup does for SST. I don't recall rsync clobbering the entire data directory, so this should eliminate any issue due to rsync not removing anything which may be causing you issues.

Can you share your configurations for nodes 1 and 2? At least the wsrep_* related configurations.

Galera Node won't start after recovery

2 Answers2