0

The backup performance of a BackupExec installation suddenly dropped by 50-70% for no apparent reason. The was no user intervention, no reconfiguration, nor updates, and all tapes were affected at once. The system is deployed on a Windows 2003 SBS 32-Bit system, no remote agents involved (except the local one, means: no networking involved).

I do not find any clues about the cause of the failure. The result is that the backup is automatically cancelled after 6 hours where it took 4 hours before and it only walked about 50% of the files and 20% of the data volume opposed to a usual complete backup run. The capacity of the tape is also not used (90% before, now only a fraction of it).

I tried to turn of the single instance backups and also tried turning of using snapshot providers to no avail.

There is no error message as the backup job times out before it can finish (so in fact the error is "backup job did not complete within time" or similar).

Update: The problem persists with or without AOFO. We also ran the cleaning tape. 4 tapes are in use since about 2 years, one tape is pretty fresh. Both generations of tapes show the same issues so it seems not related to the tape. However we are going to try again with a brand-new one.

Any ideas how to debug this?

hurikhan77
  • 567
  • 3
  • 9
  • 22

4 Answers4

1

You can Debug BEX using the SGMon utility, it is in the program directory.. however, it has quite extensive output..

You can also create smaller jobs, and run them sequentially, or, to back them up to a "folder" first, then run a "duplicate" backup job to tape. If it fails on the folder job, its a network/source issue, if it fails on the tape, its a drive[r]/tape issue.

One of our servers started to do something like this, we got the drive itself replaced ASAP, problem solved.

Grizly
  • 2,053
  • 15
  • 20
  • How long is a tape drive going work? Its about 2-3 years old doing daily backups (5-day weeks, about 30 GB per day)... – hurikhan77 Feb 25 '10 at 12:00
  • 1
    Hmm.. our LTO2's do almost 300GB a day each, cleaned once a week, lasted over 5 years now.. the LTO4 shat itself in under 12 months.. (still in warranty!) You say you only use 4 tapes? is that right? you might want to look into backup strategies and investing in some more. Our "Daily" tapes only get used once a month. Our Monthly Tapes get used once a year, our annual tapes only get used once. Tape itself won't last very long, except maybe in hermetically sealed anti-static, anti-sun, anti-moisture, anti-heat storage.. even then only a few decades.. excessive use will degrade them FAST. – Grizly Feb 25 '10 at 22:53
  • We are using 5 for a one-week backlog... 4 + 1 = 5 ;-) But this is a good point, so +1. – hurikhan77 Mar 02 '10 at 03:33
  • +1 as it seems like the drive is really at fault, and also because backup-to-disk first, then duplicate, is a good idea anyway. We tried with a brand-new tape and that also failed in the same way. Running the drive-diagnostics showed an old firmware which I then updated. Now the drive reacts to eject jobs again but the backup jobs still fail. – hurikhan77 Mar 16 '10 at 15:41
  • Accepted: We replaced the drive and everything went back to normal. The drive has totally given up meanwhile, cannot even eject the tape. – hurikhan77 Apr 28 '10 at 09:15
0

Check your job logs. This kind of thing is generally caused by BE having a screaming fit over a single file somewhere (possibly an Access database, PST or similar on a file share which a user has left a file lock on), and it should be immediately possible to identify precisely the point during the job at which things slow down.

Maximus Minimus
  • 8,937
  • 1
  • 22
  • 36
  • This should be resolved by restarting the server which we did. Anyway I'll try to find traces of what file may cause this. However, it is completely unpreditible how much data is backed up, some days it stored only 4 MB, another day it stored 2 GB before the timeout was reached. – hurikhan77 Feb 24 '10 at 20:26
  • BTW: The backup job is set to skip files if a lock cannot be acquired within 30 secs... – hurikhan77 Feb 24 '10 at 20:28
0

The capacity of the tape is also not used (90% before, now only a fraction of it).

I've seen behaviour like this, and what the problem was was that the tape set was X uncompressed, and X*2ish compressed. Once I got more than X, the backup slowed WAY down (because of the compression overhead) and suddenly I had all this extra space.

Satanicpuppy
  • 5,917
  • 1
  • 16
  • 18
  • If I understand you right this means BE adaptively enables compression when the uncompressed capacity will not fit the backup set? Then it should help to split the backup set and skip compression? – hurikhan77 Feb 24 '10 at 20:24
0

There has to be at least one remote agent involved, the one on the server you're backing up, even if it's the backup server itself. Have you checked the tape drive for any errors or alerts? does the tape drive need cleaning? Are you using the AOFO?

joeqwerty
  • 108,377
  • 6
  • 80
  • 171