4

I am using BackupPC to backup some workstations at the office. One workstations in particular has a sizable amount of data when compared to the rest, however, it's not that large in general (talking about 250ish GB's of actual data).

BackupPC seems to take forever to back this system up (several days, ie. more than 72 hours+). All workstations are being backed up via rsync over an autofs local mount over the network.

Basically, autofs mounts the administrative C share on the workstation, then BackupPC treats it as a local directory, cd's into the automount directory, and rsync's all the data.

It is slow in general, and I generally attributed it to the BackupPC box being slow hardware, but this performance is more-or-less acceptable on all workstations except this one with the larger amount of data.

rsync flags:

/usr/bin/rsync --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive

These are the default args that BackupPC is setup with.

I read some stuff online that indicated atime on the mounts may be slowing things down -- so I changed my autofs config to mount the directories with the noatime flag... no difference.

I read some stuff that indicated rsync may be the culprit due to how it's checking the files... and suggested switching to tar instead. So I did... but no difference.

tar flags:

tarPath="/bin/gtar"
env LC_ALL=C $tarPath -c -v -f - -C $shareName+ --totals

# where $shareName is the autofs mounted directory

These are the default args that BackupPC is setup with.

No change.

Monitoring the BackupPC box's network activity with iftop, it seems it will spike in usage for a while (sometimes up to 90Mbps) but then will drop back down into Kbps or even sometimes Bps range. While it's in the slow mode, top shows activity with the command BackupPC_dump which is the backup job... so it's doing stuff and isn't stuck.

The current backup has been running now for over 24 hours, yet has only transferred 75.9GB according to iftop.

SnakeDoc
  • 560
  • 6
  • 23
  • Tell us about the files on the slow system. Is it a huge number of small files perhaps? Large quantities of small files=slow file transfers. – Zoredache Jul 23 '14 at 17:18
  • @Zoredache it's a mix. This box is used as a sort-of low-priority file share for some of the users... they tend to stuff it with a mixture of stuff, lots of directories and directories in directories, etc... filled with everything from excel files, word docs, access db's (varying in size), up to program installer exe's, iso images, etc. I would suppose there are more small files than large though. The small file issue was one of the reasons the internet recommended switching to `tar`, that it may help... is there a flag or something I should modify? – SnakeDoc Jul 23 '14 at 17:20
  • 1
    Well you might want to run some find, ls, or some kind of recursive listing to get a count of the files. Are you talking 50k, 500k. How long does it take to simply get a count. I don't know of any great ways to speed up backups of filesystems with a huge number of small files. The answer sometimes is to just do a backup at the device level. – Zoredache Jul 23 '14 at 17:25
  • @Zoredache hmm, one of the drives on the problem box has: `392,756` files, the other drive has: `615,490` files. So, ya... a lot of files! It did take a while for `find . -type f | wc -l` to run... – SnakeDoc Jul 23 '14 at 18:05

2 Answers2

3

It may be faster to run rsync directly on the server. You have about a million files to access over the network. There are a couple of minimal installs of rsync that you can run. I've setup BackupPC on Windows this way. You can run a full Cygwin install, or the minimal cygwin-rsycnd install available in the BackupPC project.

BillThor
  • 27,354
  • 3
  • 35
  • 69
  • you are talking about setting up rsync on the box-to-be-backed-up, and then switch to rsyncd in BackupPC so that box pushes it's data to the backup machine? Or are you saying I should bypass BackupPC alltogether and just rsync the data manually? – SnakeDoc Jul 24 '14 at 15:08
  • 1
    He is saying to install a rsync server on windows and use backuppc to use that rsyncd as the source. This will make the local rsyncd check the files on the desktop and only send the name and md5 over the network so backuppc compare with his version. with autofs, every file must be send via network, so backuppc can do the md5 and compare. With many files the network lag can take a huge performance hit. Also add the checksum caching `'--checksum-seed=32761', ` (see backuppc docs) – higuita Jul 24 '14 at 21:45
1

You should check everything on both sides of your backuppc config. first, check the server and try to increase it performance, but if you have other machines that perform better, lets skip this one.

Next check the network! Network speed detected by the desktop, package size, cable quality. do some benchmarks, do a rsync (rsync-rsyncd) test of a big file. Test from other desktop and to other desktop. You should see the if you have some problem there or not.

Finally the the desktop. CIFS on the machine might not be in the best shape, and as i said above, rsync from a network filesystem will download every file over the network again and again, as rsync things the filesystem is local and will check the md5 of the file... but the file needs to be fetch over the network just do do that check. So as BillThor point, one rsyncd on the desktop will be a lot more efficient. Also, the checksum caching will help the backuppc server to not check it files, so reducing its load. Defrag the desktop and remove (or exclude) any kind of file not needed (windows have many useless files all over the place).

Finally, the number of files... many files make any backup over the network take ages, so play with then... instead of one big backup, break it in smaller parts. some files change more than others, so group directories by probability of change. Instead of one big backup every x days, have 3 backups, one in X days, another in 2 x days and the the less updated files for say, 3 x days. this way you are avoiding having to parse every file all the times. If you have "archives" files, consider compress then. Even if not compressible (use the zip store then), it will turn 10.000 in just 1... big save during backup time.

if you can't do that, you may consider changing the backup method. on a huge machine with many files i used drive snapshot to do a HD image and then do a periodic incremental snapshot. May look a overkill, but that program is fast doing the incremental at block level and will bypass the many files problem. For me, it reduced a filesystem backup of 48h to a 3h backup block level. Is not as flexible as the backuppc, but it works. just don't forget that when you do a defrag, you must do a full backup again, or the incremental will be as big as the full. :)

check this blog post for how to backup from windows (with a bonus for shadow copy). read all the comments, as i add many important info and updates there.

higuita
  • 1,093
  • 9
  • 13