3

I have a Pulsar cluster of 3 machines. Each one running Pulsar broker, Zookeeper and Bookkeeper. I have the following in my broker.conf:

managedLedgerDefaultEnsembleSize=2
managedLedgerDefaultWriteQuorum=2
managedLedgerDefaultAckQuorum=2

So I should be able to take any one of the 3 machines down for a while without any disruption in service right? And when I bring it up will it get copies of all the message it missed? I just want to make sure I am understanding things correctly before I do this to our live cluster. I don’t want to have a very bad weekend!

David Tinker
  • 557
  • 1
  • 8
  • 16

1 Answers1

2

Oh sorry for missed the configure of (EnsembleSize, writeQuorum, AckQuorum) quorum value of (2,2,2) in previous answer. If only with 3 bookies, it will not support one machine down under quorum (3,3,2).

But even with quorum (2,2,2)before taking one machine off, be sure turn bookkeeper auto-recovery off by using command bin/bookkeeper shell autorecovery -disable, and turn it on when machine come back by using bin/bookkeeper shell autorecovery -enable.

If not set off, bookkeeper will do auto-recovery once a machine is offline, because bookkeeper was expected to have 3 data copies, but it only have 2 copies now. And since it will not success to find a third available machine to place the recovered copy, so auto-recovery will be fail.

For more information of bookeeper auto-recovery, you could check this link. Here is part of the content:

You can disable AutoRecovery at any time, for example during maintenance. Disabling AutoRecovery ensures that bookies' data isn't unnecessarily rereplicated when the bookie is only taken down for a short period of time, for example when the bookie is being updated or the configuration if being changed.

Jia Zhai
  • 131
  • 2