Can I ensure data redundancy on the server when I use RSYNC?

1

1

Let's say I have server which provides a mission critical services, these services rely on data which are synced from another server. The question is whenever I can assure data redundancy on the server if for data syncing I will use RSYNC?

Does RSYNC provides ways to check whenever the syncing has finished without any kind of data corruption (for whatever reason)?

Is there a reliable way to resume syncing of a file (from the point it was interrupted) in case of a dropped line or similar? Is there perhaps a better alternative to RSYNC, which should be used in scenarios where data redundancy is critical ?

Marek Szanyi

Posted 2012-06-11T18:25:53.423

Reputation: 123

Answers

1

Your question is somewhat confusing to me. If by "redundancy", you mean "integrity"--i.e. "if I Rsync something to my server can I be guaranteed that it's bit-for-bit identical to the source material?" the answer is: probably, but the integrity checking is only as good as the checksumming/comparison methods employed by Rsync. For more information on those, I'd refer you to the man page for Rsync (check the -c and -B options) and the wiki page for Rsync, which discusses the comparison algorithms used by Rsync.

With regards to interrupted Rsyncs: in addition to checking the return code of Rsync to determine whether it finished correctly (list of Rsync exit codes can be found here) running Rsync again is the best way to verify whether it concluded without corruption or interruption the first time. Rsync will compare and skip any files that it has already successfully copied, and will copy anything new or different from the source (this accomplishes the same thing as "resuming" an interrupted transfer). If the source material is changing so rapidly that you can't be guaranteed consistency between the two runs, Rsync may not be the best tool to ensure synchronization.

If by "redundancy", you really do mean "redundancy": i.e. "I only want to overwrite on the destination server if I'm sure I have an intact copy of the data to use, even if my transmission drops", then the solution would be to do multiple Rsyncs, like this:

  1. Rsync the old copy of the data from local location 1 to local location 2--both on the same server, or internal network locations that have a low risk of connection failure.
  2. Run step 1 again and check for errors. This verifies that you have two copies of the existing data set that are identical.
  3. Rsync the remote (new) copy of the data down to location 1.
  4. Run step 3 again, to ensure that location 1 contains an intact copy of the new data.

If step 4 or 5 are interrupted, you can a) try to Rsync from the remote source again, or b) simply reverse the direction of the Rsync in step 1 and run it again, replacing the (presumably corrupt) copy of the remote data with the most recent "known good" copy from the backup location on the local machine. This would guarantee that you have "redundancy", in that you never are at risk of having only a corrupt copy of your data with no way back to an intact data set.

Zac B

Posted 2012-06-11T18:25:53.423

Reputation: 2 653

1Further to @ZacB's comment of re-running rsync, if you want to be extra sure after the last rsync comes back saying no changes were made then run it once more invoked with --checksum. This will take a lot longer but will eliminate the chance of bitrot having taken place on one of the drives, although how it will resolve (choose which file to keep) in the base of bitrot I am not sure and it may end up overwriting the good data with the bad. – flungo – 2015-05-24T09:29:39.650

yes I did mean integrity – Marek Szanyi – 2012-06-11T20:16:22.210

Ok. Running rsync again until it doesn't find any differences is the best way to ensure integrity (within the constraints of rsync's checksumming pattern) then. – Zac B – 2012-06-11T20:43:26.557