I have two hosts set up as very simple master\slave. All servers connect to the floating IP 10.1.1.10.
Host A:
Designation: Master
Floating IP - 10.1.1.10
Standard IP - 10.1.1.11
relevant pg_hba.conf - these match so I can fail back\forth
host replication replica_user 10.1.1.11/32 md5
host replication replica_user 10.1.1.12/32 md5
host replication replica_user 10.1.1.10/32 md5
relevant postgersql.conf - these match so I can fail back\forth
checkpoint_timeout = 30min # range 30s-1d
max_wal_size = 2GB
min_wal_size = 1GB
checkpoint_completion_target = 0.7 # checkpoint target duration, 0.0 - 1.0
Host B:
Designation: Slave
Floating IP - N\A
Standard IP - 10.1.1.12
relevant pg_hba.conf - these match so I can fail back\forth
host replication replica_user 10.1.1.11/32 md5
host replication replica_user 10.1.1.12/32 md5
host replication replica_user 10.1.1.10/32 md5
relevant postgersql.conf - these match so I can fail back\forth
checkpoint_timeout = 30min # range 30s-1d
max_wal_size = 2GB
min_wal_size = 1GB
checkpoint_completion_target = 0.7 # checkpoint target duration, 0.0 - 1.0
The slave was effectively provisioned using `
systemctl postgresql stop
rm -rf /var/lib/pgsql/12/data/
pg_basebackup -h 10.1.1.11 -D /var/lib/pgsql/12/data -U replica_user -P -v -R -Xs
I know this works as I see the slave 'pulling' from the master. The slave also has a postgresql.auto.conf which tells it to 'pull'
My failover procedure is as follows. Once complete the servers begin writing to Host B, the new master.
Host A:
systemctl stop postgrsql
Drop interface 10.1.1.10
Host B:
pg_ctl promote /var/lib/pgsql/12/data
Bring interface up
Restart postgresql
The problem comes now when I want to fail back. The fail back seems very clunky in that I have to now go on Host A
Host A:
systemctl postgresql stop
rm -rf /var/lib/pgsql/12/data/
pg_basebackup -h 10.1.1.12 -D /var/lib/pgsql/12/data -U replica_user -P -v -R -Xs
Now Host A is a proper slave. I wait for all the changes to propagate from Host B to Host A (the inserts\updates while I was doing maintenance on A). Now it's time to make B a slave again.
Host A:
pg_ctl promote /var/lib/pgsql/12/data
Flip IP
At this point 'A' is the master and receiving writes. There is no slave. Now it's time to make B a slave.
Host B:
systemctl postgresql stop
rm -rf /var/lib/pgsql/12/data/
pg_basebackup -h 10.1.1.11 -D /var/lib/pgsql/12/data -U replica_user -P -v -R -Xs
Now A is the master again and B is pulling. I don't see why I have to copy an entire database just to flip fail back. Host B is effectively Host A w\ a couple of diffs (during the maint there's some updates\inserts from the servers). Why can't I just copy the diffs over from host B to A, flip it, then push the now additional changes from A to B.
It seems like this master\slave flip should be a single command.