1

I have two hosts set up as very simple master\slave. All servers connect to the floating IP 10.1.1.10.

Host A:

Designation: Master
Floating IP - 10.1.1.10
Standard IP - 10.1.1.11

relevant pg_hba.conf - these match so I can fail back\forth
    host    replication     replica_user 10.1.1.11/32  md5
    host    replication     replica_user 10.1.1.12/32  md5
    host    replication     replica_user 10.1.1.10/32   md5

relevant postgersql.conf - these match so I can fail back\forth
    checkpoint_timeout = 30min              # range 30s-1d
    max_wal_size = 2GB
    min_wal_size = 1GB
    checkpoint_completion_target = 0.7      # checkpoint target duration, 0.0 - 1.0

Host B:

Designation: Slave
Floating IP - N\A
Standard IP - 10.1.1.12
relevant pg_hba.conf - these match so I can fail back\forth
    host    replication     replica_user 10.1.1.11/32  md5
    host    replication     replica_user 10.1.1.12/32  md5
    host    replication     replica_user 10.1.1.10/32   md5

relevant postgersql.conf - these match so I can fail back\forth
    checkpoint_timeout = 30min              # range 30s-1d
    max_wal_size = 2GB
    min_wal_size = 1GB
    checkpoint_completion_target = 0.7      # checkpoint target duration, 0.0 - 1.0

The slave was effectively provisioned using `

systemctl postgresql stop
rm -rf /var/lib/pgsql/12/data/
pg_basebackup -h 10.1.1.11 -D /var/lib/pgsql/12/data -U replica_user -P -v -R -Xs

I know this works as I see the slave 'pulling' from the master. The slave also has a postgresql.auto.conf which tells it to 'pull'

My failover procedure is as follows. Once complete the servers begin writing to Host B, the new master.

Host A:
systemctl stop postgrsql
Drop interface 10.1.1.10

Host B:
pg_ctl promote /var/lib/pgsql/12/data
Bring interface up
Restart postgresql

The problem comes now when I want to fail back. The fail back seems very clunky in that I have to now go on Host A

Host A:
systemctl postgresql stop
rm -rf /var/lib/pgsql/12/data/
pg_basebackup -h 10.1.1.12 -D /var/lib/pgsql/12/data -U replica_user -P -v -R -Xs

Now Host A is a proper slave. I wait for all the changes to propagate from Host B to Host A (the inserts\updates while I was doing maintenance on A). Now it's time to make B a slave again.

Host A:
pg_ctl promote /var/lib/pgsql/12/data
Flip IP
At this point 'A' is the master and receiving writes.  There is no slave.  Now it's time to make B a slave.

Host B:
systemctl postgresql stop
rm -rf /var/lib/pgsql/12/data/
pg_basebackup -h 10.1.1.11 -D /var/lib/pgsql/12/data -U replica_user -P -v -R -Xs

Now A is the master again and B is pulling. I don't see why I have to copy an entire database just to flip fail back. Host B is effectively Host A w\ a couple of diffs (during the maint there's some updates\inserts from the servers). Why can't I just copy the diffs over from host B to A, flip it, then push the now additional changes from A to B.

It seems like this master\slave flip should be a single command.

0 Answers0