1

Here is what I do:

rsync everything
enter maintenance
rsync changes since first rsync
leave maintenance

The first rsync is syncing the mayor changes without locking the system. It can run a long time and that is fine.

But the second rsync should finish as quickly as possible and normally finds none or only a few changes. Yet since this are so many files it takes quite long.

Is there a trick I can use because I know I synced it just before?

Here the rsync flags I use:

rsync --partial --progress --delete --archive --verbose --compress --links --times
Torge
  • 113
  • 4

3 Answers3

2

Some tips:

  • when synching many files, the uid-to-username mapping process can take a significant amount of time; so, try adding the --numeric-ids option to your rsync invocation
  • on both sides, schedule a find <path> > /dev/null some time before the maintenance rsync; this will preload the metadata caches on both nodes, greatly speeding up rsync execution
shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • Thx. I will try the --numeric-ids but the second hint should already been taken care of by the first rsync, no? – Torge May 25 '17 at 17:06
  • Yes and no... The first rsync will eventually transfer data, which will evict metadata from cache, slowing down the second transfer. – shodanshok May 25 '17 at 21:34
  • Ok, but in my successive tests he was not syncing any data (because of the quick succession of testing) so I fear I have to live with the current downtime. So for now this is closest to what I wanted. – Torge May 25 '17 at 22:23
2

Speeding up rsync over ssh

You can utilize a different client that is multi-threaded that will break up your job into as many connections as ssh on the remote end will allow and that you specify.

Take a look at Rsync over SFTP using the LFTP client and its mirror subsystem

You can use the --loop option to continue the sync (re-sync) until nothing as changed.

I use this on multiple systems that replicate database backups and logs to remote destinations hourly. It is extremely fast. Your only limits will be bandwidth and the max connections allowed to auth on the remote end at once, the max number of files allowed to be open at once.

This method can also be more secure than rsync+ssh, as Chroot SFTP is supported. There is no need to provide a shell to the client, if you prefer.

Aaron
  • 2,809
  • 2
  • 11
  • 29
  • Thanks for the effort. But I fear this is not what I am looking for. `--loop`will not work, as I also need a DB backup matching the files, hence the maintenance mode. Also I don't see how parallelizing here helps in speeding up scanning for differences. I kinda hoped for an option to specify a min date (last sync) so he can skip most files based on that without checking against the other side. But I guess this is not working out. – Torge May 25 '17 at 22:21
  • I have found this method to determine differences substantially faster than rsync can calculate. What results did you get when you tested it? – Aaron May 25 '17 at 22:43
0

I prefer add a second step of rsync brefore putting the system in maintenance. If the first step takes ages, the second one can take minutes/hours before putting the system in maintenance. Then the last one takes seconds/minutes, instead of hours.

Dom
  • 6,628
  • 1
  • 19
  • 24
  • as I said I normally have no or only a few changes. I want to speed up the scan that validates this. – Torge May 25 '17 at 16:53