0

I try to install a little GlusterFS (for NFS) Cluster with CentOS High Availability. I have really no idea why it failed. I started multiple times from scratch. My problem is, that I can't see any error like "DNS unresolvable" or "connection failed". Nothing.

Basicly I have a Proxmox Infrastrucutre as underlying layer. I have created 3 VMs with CentOS 7.2 with:

1 x vNic (connected to the bridge) 8 GB of RAM 30GB Disk for OS 50GB Disk for storage mounted in /data/disk01/archive.

  • I have a own /29 subnet for the glusterfs systems.
  • Local firewalld is disabled and stopped (iptablels has allow all)
  • SELINUX ist disabled
  • Search Domain is "staging.mydomain.ending"
  • DNS Server can resolve the hostnames
  • I also added hostname and hostname+domain to /etc/hosts (so I can garantee, dns is working)

At least, I followed this tutorial: https://jamesnbr.wordpress.com/2017/01/26/glusterfs-and-nfs-with-high-availability-on-centos-7/ But for sure, I tried some more. But all of them are the same and doesn't have really differences.

First Issue: GlusterFS NFS

I could install the volume successful as described in the tutorial:

Volume Name: rdxarchive
Type: Replicate
Volume ID: dfe190c8-b4fd-413e-9b58-214c4f295cba
Status: Created
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: fra1-glusterfs-m01:/data/disk01/archive
Brick2: fra1-glusterfs-m02:/data/disk01/archive
Brick3: fra1-glusterfs-m03:/data/disk01/archive
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: off
transport.address-family: inet
storage.fips-mode-rchecksum: on

But I was not able to mount the NFS.

[18:48:44 root@fra1-glusterfs-m01]{~}>showmount
clnt_create: RPC: Program not registered

RPC Bind is working:

[18:49:10 root@fra1-glusterfs-m01]{~}>systemctl status rpcbind
● rpcbind.service - RPC bind service
   Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2020-11-01 17:42:11 CET; 1h 7min ago
  Process: 637 ExecStart=/sbin/rpcbind -w $RPCBIND_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 646 (rpcbind)
   CGroup: /system.slice/rpcbind.service
           └─646 /sbin/rpcbind -w

Nov 01 17:42:11 fra1-glusterfs-m01.staging.mydomain.ending systemd[1]: Starting RPC bind service...
Nov 01 17:42:11 fra1-glusterfs-m01.staging.mydomain.ending systemd[1]: Started RPC bind service.

And for sure I have no local NFS-Server (just the nfs-uitls).

Any idea what's going wrong?

Sescond Issue: Pacemaker & Corosync.

I created the cluster as described in the tutorial. I got no errors. Authentifiaction, Connection, Creation. Everything is allright.

Cluster Status is green:

[18:55:22 root@fra1-glusterfs-m01]{~}>pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: fra1-glusterfs-m01 (version 1.1.21-4.el7-f14e36fd43) - partition WITHOUT quorum
 Last updated: Sun Nov  1 18:55:49 2020
 Last change: Sun Nov  1 17:23:13 2020 by root via cibadmin on fra1-glusterfs-m01
 3 nodes configured
 1 resource configured

PCSD Status:
  fra1-glusterfs-m01: Online
  fra1-glusterfs-m02: Online
  fra1-glusterfs-m03: Online

The normal status request failed and two of three nodes are offline. Only the local node is online:

[18:55:50 root@fra1-glusterfs-m01]{~}>pcs status
Cluster name: rdxfs
Stack: corosync
Current DC: fra1-glusterfs-m01 (version 1.1.21-4.el7-f14e36fd43) - partition WITHOUT quorum
Last updated: Sun Nov  1 18:56:24 2020
Last change: Sun Nov  1 17:23:13 2020 by root via cibadmin on fra1-glusterfs-m01

3 nodes configured
1 resource configured

Online: [ fra1-glusterfs-m01 ]
OFFLINE: [ fra1-glusterfs-m02 fra1-glusterfs-m03 ]

Full list of resources:

 virtual_ip     (ocf::heartbeat:IPaddr2):       Started fra1-glusterfs-m01

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Corosync logs: https://pastebin.com/TBC8yxHK

Any idea whats going wrong there? Why two nodes offline? Who can I debug? Where is the error in the logs?

When I create the cluster with IP instead of DNS, corosync and pacemaker seems to work. But I got a warning while using "pcs status", that there are nodes with IP and DNS. When I login into the "Radhat High Availability" gui, I can see 6 nodes instead of three: 3 Nodes with IP and 3 Nodes with DNS. On the nodes with IP, corosync is connected and green (Pacemaker is not successfull). On the nodes with DNS, corosync is failed and Pacemaker is connected and successfull).

Versions:

[19:01:36 root@fra1-glusterfs-m01]{~}>gluster --version
glusterfs 8.1
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.


[19:01:41 root@fra1-glusterfs-m01]{~}>pcs --version
0.9.168


[19:02:20 root@fra1-glusterfs-m01]{~}>corosync -v
Corosync Cluster Engine, version '2.4.5'
Copyright (c) 2006-2009 Red Hat, Inc.


Linux fra1-glusterfs-m01.staging.mydomain.ending 3.10.0-1127.18.2.el7.x86_64 #1 SMP Sun Jul 26 15:27:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Hope I can get some help.

Thanks.

  • 1
    What are pacemaker and corosync for here? They aren't necessary for glusterfs which manages its own availability. – Michael Hampton Nov 01 '20 at 21:08
  • @MichaelHampton good question. But a lot of GlusterFS tutorial are using pacemaker. Maybe for the VIP-IP? I have no idea. Basicly I can use my existing load balancer for that. But if I don't need Pacemaker, there is still my glusterFS NFS Issue. Any idea? Thanks! – eiskaltereistee Nov 01 '20 at 22:47

0 Answers0