Kinder, gentler backups on linux

Question

Earlier this week I had a 'perfect storm' moment on my servers: Two backup jobs (one for each RAID10 array on the system) had been humming along for 18 hours, and then we had a sustained spike in traffic on my I/O intensive application. The result was unacceptably slow performance, and I had to force our administrator to cancel the backup. (He was not happy about this...not at all. "I'm not responsible if...")

The end result was lots of stress, unhappy customers, and a very grouchy Stu.

The bottleneck was disk utilization. Once the jobs were canceled, everything was working just fine. What can I suggest to my administrators to lessen the impact on my servers?

Here are some of the gory details:

The backup command itself (I got this out of ps, but really don't know what it means.)

bpbkar -r 1209600 -ru root -dt 0 -to 0 -clnt xtx-le00 -class F_Full_on_Thursday
-sched Incr_Fri_to_Wed -st INCR -bpstart_to 300 -bpend_to 300 -read_to 300 
-blks_per_buffer 127 -stream_count 8 -stream_number 8 -jobgrpid 223932 -tir -tir_plus 
-use_otm -use_ofb -b svr_1259183136 -kl 28 -fso

The system

RHEL4 64-bit
4GB RAM (~half used by applications)
DL380G5 with two attached SAS RAID10 partitions, ~550GB and ~825GB

The data

1TB
~10 million files

The application

busy from 0900 to 2300 on weekdays
I/O intensive (99% read) mostly focused on a few hundred MB of files

What percentage of your content is being read by these 99% reads? if you could get higher cache hits that would lighten your disk loads, how much memory on the box and how big is your disk cache? — Chopper3, Nov 26 '09 at 12:09
Disk caching is very important, yes. 4GB RAM, applications use up <=2GB. 99% of reads are probably a mere few hundred megabytes of files. The server runs fine normally...disk utilization has never been a problem, until this backup issue, where the backup process sucked up 33-55% (and more when the server was not busy.) — Stu Thompson, Nov 26 '09 at 12:35
Any backup process is going to bust your cache all over the place and also play hell with your io localization. — chris, Nov 27 '09 at 03:52

score 8 · Answer 1 · answered Nov 26 '09 at 14:41

We have a system wher we rsync live servers to backup servers (which are built out of cheap 1TB SATA discs) then take full tape backups of the backup servers. It's excellent:

Belt and braces - all the advantages of both backups
reduces the IO load on the live servers considerably
faster restores if you just want one or two files
full set of tapes for the offsite archive

gekkz · Accepted Answer · 2009-11-26T15:27:31.623

4

I'm not sure how bpbkar works really, but I would use rsync to backup all the files offsite and then keep them in sync, which would consume very little resources, as only changed files are updated. Naturally, this means it would take quite some time for the initial backup, but you already say you've been 'humming for 18 hours'.

You would then simply manage the backed up data from the other machine however you wanted to.

Small edit: If you choose to step away from tape backups on to disk backups you may want to use RAID6 which will offer dual parity.

edited Nov 26 '09 at 15:27

answered Nov 26 '09 at 12:48

gekkz

4,219
2
20
19

1

It is a full, proper backup system. Tape backup (with robot arms and other sexy stuff), occasional snapshots, nightly differentials, etc. `rsync` is...cruder, no? Not sure if I can get my administrators to effectively downgrade our backup. – Stu Thompson Nov 26 '09 at 13:42
1

As someone who spends a lot of time swearing at backup systems of all kinds, moving to disk backups from tape backups is in no way a downgrade. I don't think that rsync is going to do much in this situation, though. – womble Nov 26 '09 at 13:48
Maybe if it were faster, so that the backup was completed in <8 hours. Hmmm... – Stu Thompson Nov 26 '09 at 14:41
1

I agree. The fact that rsync won't be transfering as much data in no way lessens the fact that while it is running it will pound the disk being backed up. In fact it will get pounded harder, since disk-to-disk backups can run at network wire speeds, not tape drive speeds. – David Mackintosh Nov 26 '09 at 15:00
I think the days of mechanical tape backups with robotic arms are very much numbered. I can't believe we still use tapes! This is 70's & 80's technology upgraded for the naughties. But still slow as anything and highly inefficient. Hot backups to hard drives are the way to go now that hard drives have become so much cheaper. – hookenz Nov 27 '09 at 03:50
In the end, we moved to a new data center where I have full control. *(Insert evil laugh.)* And I went for a very similar approach: off-site backup over the net with a custom OSX-Time-Machine-like rynch-based backup scheme to pairs of RAID0 sets. – Stu Thompson Jul 24 '10 at 09:04
1

RAID0 and backups, in the same sentence? – gekkz Jul 24 '10 at 11:21

score 3 · Answer 3 · answered Nov 26 '09 at 13:24

3

If your backups take 18 hours to run normally, deprioritising them probably isn't going to solve the problem (unless you want to run your backups for a couple of days at a time). I'd be inclined to setup a disk replication mechanism to another machine (I like DRBD, myself) and then use LVM to take a point-in-time snapshot, backup that, and move on. Because it's running on a separate machine, (a) it can hammer as hard as it likes without affecting the live app, and (b) it won't be contending with the live app for disk IO, meaning it'll probably run a whole lot faster as well.

One thing I can say for sure: anything you do on the same machine is going to completely bone your disk cache -- as the backup process reads all the data off the disk to be backed up (even if it just checks mtimes rather than reading and checksumming all the files), that's still a lot of metadata blocks running into your cache, and those will be kicking out useful data from the cache and causing more disk IO than is otherwise warranted.

answered Nov 26 '09 at 13:24

womble

95,029
29
173
228

I think a full backup take >18 hours. The differentials take only a few hours. But your points still stand. Especially the one about the cache. It would be nice if there was some way to influence that...like a magic `-DoNotModifyCache` option for the backup process. Hmmm... – Stu Thompson Nov 26 '09 at 13:38
You'd need to modify the way the kernel works for that to happen. Putting both the live service and the backup process into separate VMs is the only way I can think of to do that, and that's a sledgehammer/nut situation if ever I saw one. – womble Nov 26 '09 at 13:46
Not quite - you can use posix_fadvise to tell the kernel not to cache data for a certain file handle, but that relies on having the source to the app and NBU is closed source. There's patches floating around to add this functionality to rsync though. – James Nov 26 '09 at 23:41
Awesome, a new syscall I didn't know about. I can't see which advice would tell the kernel not to cache data, though; NOREUSE is a NOOP, DONTNEED only says it'll flush the data out of the cache (which means that you reduce your cache usage, but you'll still be kicking out some useful data), and none of the others seem to suit. – womble Nov 27 '09 at 00:09
Yep - DONTNEED is about as close as you can get... – James Nov 27 '09 at 11:25
That's not nearly as awesome as I was hoping. – womble Nov 27 '09 at 18:40

score 3 · Answer 4 · answered Nov 26 '09 at 17:22

3

bpbkar is Veritas Netbackups backup client. It supports throttling, so the combination of normal I/O and backup I/O doesn't saturate your disks. Look at here:

http://seer.entsupport.symantec.com/docs/265707.htm

Is there anything stopping you doing full backups at the weekend, as you say the system is mostly busy weekdays, and incremental backups during the week? That'd help you get the backup done during the quiet slot between 2300 and 0900

answered Nov 26 '09 at 17:22

xenny

780
4
8

There was resistance to backing up on the weekend. I suspect their are other backup 'customers' in those time slots. I will raise the throttling idea with my administrators...this looks promising. – Stu Thompson Nov 26 '09 at 18:39
Unfortunately I suspect that part of the problem isn't the I/O rate itself but the fact that doing the full backups is causing the filesystem cache to get filled with useless stuff and the disk arms are swinging all around when they should be over the hotspots. – chris Nov 27 '09 at 03:38

score 1 · Answer 5 · answered Nov 26 '09 at 15:16

Another vote for rsync. I use it to daily backup 9TB of a very heavy used fileserver. never had an issue.

If you're concerned about 'point in time', create an LVM snapshot, mount, rsync, umount, destroy. Somewhat higher load on the server, but still far (far!) less time than a full copy.

If the administrator says that it must positively, absolutely be bpbkar, first do an rsync to a less used system, and then run bpbkar from it. No need to hog your production system.

An anectode from testing: when we approached the 8TB limit of ext3, made some 'pull the plug' tests to determine how possible is to corrupt a file by hardware failure while copying. pulled the plug on the server, the storage boxes, and the SAN wiring. copied tens of millions of files.

Conclusions:

ext3 had on average one missing file every 10 failures.
XFS averaged less than 5 errors per failure on the storage (almost zero for failures on the server) (suprised me!, i thought XFS always failed fast and hard on hardware failure)
JFS mangled hundreds of files each and every time.

in short, rsync works really, really well. Any error could better be attributed to your hardware and/or filesystem. bpbkar wouldn't perform any better facing the same failures.

score 1 · Answer 6 · answered Nov 26 '09 at 17:26

Judging by the command you posted, and looking at the -class and -sched options, it looks like you're running a full backup on Thursday - probably not the best plan considering your usage schedule (900-2300 weekdays).

With huge datasets like that, you should look at the timing of your full backup, plus the type of incremental backup you take during the week. There are 2 types of incremental backups in NetBackup:

Cumulative Incremental - backs up every file changed since the last full backup
Differential Incremental - backs up every file changed since the last backup (full or incremental)

I would consider shifting your backup strategy for that system to a Full backup on Saturday or Sunday, and Differential Incremental backups for the rest of the week. That would run a full backup when there's plenty of time to do so (no/few users) and short incrementals in the few hours of low-usage that you have. The issue with this method is that restores might be a bit more convoluted - you would need more tapes - the tape for the full plus all the incrementals from that full to the point you need the data restored to.

From your question, it sounds like you aren't terribly familiar with the backup system. I understand separating the sysadmins from the backup operators, but some discussion needs to happen between them. If the backup operators have no idea how the system is being used, they can't form a proper policy and schedule for the system.

Yea, good catch on the *"full backup on Thursday"* part. I'd raised the potential for problems months ago, but was ignored. It looks like it will be moved to Friday night, but there is no guarantee that weekends will stay quite in the long term. And correct again--I am not exactly in control of the backups. The administrators are actually our hosting vendor. I am actually a mere coder who likes to get into administration. It is something I need to do to scale my application efficiently. It was a full backup that killed us this week. — Stu Thompson, Nov 26 '09 at 18:42

score 1 · Answer 7 · answered Nov 26 '09 at 23:43

1

Get your NetBackup admins to schedule the backups better - do full backups on alternating weeks for each RAID array.

You might also want to look into synthetic full backups so you don't need to do as many full backups.

answered Nov 26 '09 at 23:43

James

7,553
2
24
33

What is a *"synthetic full backup"*? Please expand your answer on that point. I've never heard of this. (As for rescheduling, others have suggested and I asked for this before even posting. Problem is there is competition for time on the backup system.) – Stu Thompson Nov 27 '09 at 07:05
A synthetic full backup is essentially NetBackup creating a full backup out of incremental backups - so it doesn't require any client I/O. Also, it sounds like there may be issues with your NetBackup environment if it is taking 18+ hours to backup >1TB of data - it's really dependent on the environment, but we see throughput of ~400GB/hr to LTO-4 tape. Tuning NetBackup is almost essential - the defaults are absolutely terrible for decent throughput on modern systems :( – James Nov 27 '09 at 10:56
edit: just realized you're on a hosted system, so this advice is probably not much help ... sorry :( – James Nov 27 '09 at 11:26
Also, with that many small files, NetBackup/tape is probably not your friend, as others suggested, dumping to disk and then to tape will help. – James Nov 27 '09 at 11:28
The system is not virtualized. We own the server hardware. But our hosting company owns the network and backup system, and they administrate the server. I don't have root. Anyway, while it is 'only' 1TB of data, it is some 15 million files. I'm trying to trim that down by 33% today. (Thanks again for your input.) – Stu Thompson Nov 27 '09 at 13:11
Can you please go into tuning NetBackup some more? This is something I'd like to raise with my admins. – Stu Thompson Nov 27 '09 at 13:15
Here's the key tunables for NetBackup: http://seer.entsupport.symantec.com/docs/183702.htm. For LTO-4 we typically set SIZE_DATA_BUFFERS = 1MB and NUMBER_DATA_BUFFERS = 64 on our media servers. NetBackup does really suck with millions of small files in my experience - maybe you can ask the hosting guys to set your NetBackup policy so it goes to disk first, then spools to tape. – James Nov 27 '09 at 17:20
Symantec/Veritas's "solution" to the problem of backing millions of small files is to use Flash Backup - which doesn't work on Linux unless you use Vertias's filesystem and volume manager... – James Nov 27 '09 at 17:21

chris · Answer 8 · 2009-11-27T03:50:03.170

1

A couple suggestions:

Do full backups less often. If your data is pretty static, you can probably get away with full backups once a month of every 2 months and cumulative incremental backups the rest of the time. You'd need 2 tapes instead of one but that shouldn't be a big deal.
Schedule the backups better. With netbackup, it is possible to ask the server to try to do backups at a certain frequency and in certain windows but let it schedule when the actual backups start and end. This typically uses the backup infrastructure more efficiently than if you try to manually schedule things yourself.
Have netbackup dump the backups to disk first, then duplicate those images to tape later after the backup has completed.

The other rsync suggestions are also good -- there is no reason why the rsynced copy of the data wouldn't be as good as the image on the primary server unless this is a database application. If it is a database sort of application, you should be copying the transaction logs and backup images to another system as they're created, and backing those up.

I would backup the data on the rsync target to netbackup, but I'd also backup the OS and everything but the program data (the stuff that's taking the space) on the primary and rsync targets. Backing up the OS and program data should be easy and fast, and it should probably be in a different backup policy anyhow.

edited Nov 27 '09 at 03:50

answered Nov 27 '09 at 03:39

chris

11,784
6
41
51

Thanks. **Less often:** This would not negate the potential for problems, just the probability. Not good enough. **Schedule *'better'*:** I'm going to ask that the backup is moved to Friday night, but there seems to be competition for time slots on the backup system. **Dump to disk first:** As others have suggested, yea, might be an option. Although an expensive option. (My storage is leased.) **`rsync`:** With our current backups, we can restore to various states in the past. Not with rsnc. *No OS backup*--have a hot standby server and can easily rebuild on catastrophic failure. Thanks! – Stu Thompson Nov 27 '09 at 06:42
By doing the fulls once every 2 months, you'd have more of a chance of getting the provider to give you the coveted "friday just after work ends" timeslot. And, you're less likely to run into issues doing your cumulative incremental backups on Thursdays, again keeping the backup provider (they do it in a less valuable time slot) and users (unlikely to cause problems by leaking into production). Just monitor the backups and when the incrementals start taking too long, schedule another full. – chris Nov 27 '09 at 16:01
Oh, and the rsync to a system running ZFS would allow you to go to various points in the past because ZFS supports snapshots. Have ZFS do snapshots every hour and keep them for a month, and keep the "midnight' snapshot for 6 months. And run the rsync every half hour so it doesn't get too far behind production... – chris Nov 27 '09 at 16:02

chris · Answer 9 · 2009-11-27T04:18:21.727

There are two issues at play -- one is of your architecture and the other is of your implementation.

You can easily optimize your implementation by doing things like changing backup windows or doing backups less often or buying faster disks or networks or tape drives or by duplicating the data to another system. These changes are valid, appropriate, and with Moore's law on your side, they may keep your service running properly forever.

You may also be getting into a situation where you're going to run into scaling issues more and more often. If you're even a little worried that you may be getting hit scaling problems more and more often, you're going to need to think about how to redesign your system to make it scale better. Such things aren't easy, but because they're not easy you need to plan for them well in advance of when you've got a gun to your head.

An example of adjusting your architecture may involve moving all your data to a NAS type system such as a NetApp filer or a box running Solaris and ZFS. With a setup like this, you backup the server, which will be mostly your program and configuration, and you use the data management features of the SAN to backup the SAN. These would be things like snapshots and transaction logs against the snapshot.

You may also do something similar to what archive.org does where you store the data on lots of different systems, usually any given piece of data exists on several systems, and then you have a farm of front-end systems that routes the requests to whichever system actually hosts the data.

Lastly -- are you sure your backups even work? Running a backup for 18 hours on a live system results in a backup that reflects that system over those whole 18 hours. Ideally a backup reflects a system at a single atomic point in time, not some crazy rolling backup where some stuff is from one point in time and others is from almost a whole day later. If any of your data is dependent on or points to other parts of the data elsewhere, these dependencies will get royally messed up if the backups get stuff mid-change, and with a dataset this large, you're 100% likely to have this scenario if it is possible, on every backup you've got.

Wow, long answer! Thanks. *But...* Scaling is generally not an issue, we have that under control. The backup during a peak was a problem, and that is understandable. Our current storage + backup is fairly inexpensive, a desirable trait. Moving to an entirely different storage system is going to cost lots of money. We are ok with a less than perfect snapshot. Yes, I am sure the backups work. We've had to go into them on occasion to retrieve deleted files in the past. *Again, thanks for the thoughts.* — Stu Thompson, Nov 27 '09 at 06:51
You do backups for several different reasons. Restoring individual files is only one of those reasons. You also need to guard against catastrophic failure and hedge against silent corruption. If you've got a web site where people upload pictures of cute puppies, you don't need to worry about this because each picture is pretty-much self contained. If you've got some databaseish application, it's much more of a concern that your backups are taking 18 hours, and the only way to make sure they work is to do a full restore to verify that the result is valid. — chris, Nov 27 '09 at 15:56
There is no reason why splitting your system into different components (server, NAS server, load balancer, etc) would be substantially more expensive if you use off-the-shelf hardware and software. ZFS on solaris + gigabit switch + 1u server would really only add marginally to the existing deployment's complexity. And, you'd be able to restore individual files without going to the hosted backup solution because they'd be in the snapshots of the ZFS server. — chris, Nov 27 '09 at 15:58

Kinder, gentler backups on linux

9 Answers9