I try to install a little GlusterFS (for NFS) Cluster with CentOS High Availability. I have really no idea why it failed. I started multiple times from scratch. My problem is, that I can't see any error like "DNS unresolvable" or "connection failed". Nothing.
Basicly I have a Proxmox Infrastrucutre as underlying layer. I have created 3 VMs with CentOS 7.2 with:
1 x vNic (connected to the bridge) 8 GB of RAM 30GB Disk for OS 50GB Disk for storage mounted in /data/disk01/archive.
- I have a own /29 subnet for the glusterfs systems.
- Local firewalld is disabled and stopped (iptablels has allow all)
- SELINUX ist disabled
- Search Domain is "staging.mydomain.ending"
- DNS Server can resolve the hostnames
- I also added hostname and hostname+domain to /etc/hosts (so I can garantee, dns is working)
At least, I followed this tutorial: https://jamesnbr.wordpress.com/2017/01/26/glusterfs-and-nfs-with-high-availability-on-centos-7/ But for sure, I tried some more. But all of them are the same and doesn't have really differences.
First Issue: GlusterFS NFS
I could install the volume successful as described in the tutorial:
Volume Name: rdxarchive
Type: Replicate
Volume ID: dfe190c8-b4fd-413e-9b58-214c4f295cba
Status: Created
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: fra1-glusterfs-m01:/data/disk01/archive
Brick2: fra1-glusterfs-m02:/data/disk01/archive
Brick3: fra1-glusterfs-m03:/data/disk01/archive
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: off
transport.address-family: inet
storage.fips-mode-rchecksum: on
But I was not able to mount the NFS.
[18:48:44 root@fra1-glusterfs-m01]{~}>showmount
clnt_create: RPC: Program not registered
RPC Bind is working:
[18:49:10 root@fra1-glusterfs-m01]{~}>systemctl status rpcbind
● rpcbind.service - RPC bind service
Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2020-11-01 17:42:11 CET; 1h 7min ago
Process: 637 ExecStart=/sbin/rpcbind -w $RPCBIND_ARGS (code=exited, status=0/SUCCESS)
Main PID: 646 (rpcbind)
CGroup: /system.slice/rpcbind.service
└─646 /sbin/rpcbind -w
Nov 01 17:42:11 fra1-glusterfs-m01.staging.mydomain.ending systemd[1]: Starting RPC bind service...
Nov 01 17:42:11 fra1-glusterfs-m01.staging.mydomain.ending systemd[1]: Started RPC bind service.
And for sure I have no local NFS-Server (just the nfs-uitls).
Any idea what's going wrong?
Sescond Issue: Pacemaker & Corosync.
I created the cluster as described in the tutorial. I got no errors. Authentifiaction, Connection, Creation. Everything is allright.
Cluster Status is green:
[18:55:22 root@fra1-glusterfs-m01]{~}>pcs cluster status
Cluster Status:
Stack: corosync
Current DC: fra1-glusterfs-m01 (version 1.1.21-4.el7-f14e36fd43) - partition WITHOUT quorum
Last updated: Sun Nov 1 18:55:49 2020
Last change: Sun Nov 1 17:23:13 2020 by root via cibadmin on fra1-glusterfs-m01
3 nodes configured
1 resource configured
PCSD Status:
fra1-glusterfs-m01: Online
fra1-glusterfs-m02: Online
fra1-glusterfs-m03: Online
The normal status request failed and two of three nodes are offline. Only the local node is online:
[18:55:50 root@fra1-glusterfs-m01]{~}>pcs status
Cluster name: rdxfs
Stack: corosync
Current DC: fra1-glusterfs-m01 (version 1.1.21-4.el7-f14e36fd43) - partition WITHOUT quorum
Last updated: Sun Nov 1 18:56:24 2020
Last change: Sun Nov 1 17:23:13 2020 by root via cibadmin on fra1-glusterfs-m01
3 nodes configured
1 resource configured
Online: [ fra1-glusterfs-m01 ]
OFFLINE: [ fra1-glusterfs-m02 fra1-glusterfs-m03 ]
Full list of resources:
virtual_ip (ocf::heartbeat:IPaddr2): Started fra1-glusterfs-m01
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Corosync logs: https://pastebin.com/TBC8yxHK
Any idea whats going wrong there? Why two nodes offline? Who can I debug? Where is the error in the logs?
When I create the cluster with IP instead of DNS, corosync and pacemaker seems to work. But I got a warning while using "pcs status", that there are nodes with IP and DNS. When I login into the "Radhat High Availability" gui, I can see 6 nodes instead of three: 3 Nodes with IP and 3 Nodes with DNS. On the nodes with IP, corosync is connected and green (Pacemaker is not successfull). On the nodes with DNS, corosync is failed and Pacemaker is connected and successfull).
Versions:
[19:01:36 root@fra1-glusterfs-m01]{~}>gluster --version
glusterfs 8.1
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[19:01:41 root@fra1-glusterfs-m01]{~}>pcs --version
0.9.168
[19:02:20 root@fra1-glusterfs-m01]{~}>corosync -v
Corosync Cluster Engine, version '2.4.5'
Copyright (c) 2006-2009 Red Hat, Inc.
Linux fra1-glusterfs-m01.staging.mydomain.ending 3.10.0-1127.18.2.el7.x86_64 #1 SMP Sun Jul 26 15:27:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Hope I can get some help.
Thanks.