4

What is an acceptable level of hard write errors on tape? Specifically, what is acceptable on HP LTO-2 media? Is it a hard number of errors, a ratio of hours in use to errors, or something else entirely?

Further background

We are using a MSL6000 library with one LTO-2 drive using Backup Exec 11d (for now). Backup Exec always shows some soft errors for most of the drives, but some are starting to show hard errors. Backups are done with immediate verification, and the verify has yet to fail, so I don't have reason to be alarmed right now.

While I can find the duty cycle for the drive (250,000 hrs), I can't seem to find any hard numbers as to when a particular tape should be should just be retired.

If there's a best practice for rotating media out, I'd love to hear that, too. We're also soon migrating to LTO-4 media, so thoughts on errors there would also be helpful.

Edited to add:

I don't have hard error on every tape. To give an idea of what I'm looking at:

Tape    Hours in Use    Hard Errors
A       142             11
B       255             0
C       159             2

The vast majority of my tapes are like B and C. A is the outlier.

I'm looking for some sort of best practice here. The tapes are verifying OK. I don't want to have a tape fail just when I want to restore, but I also don't want to toss a tape with a handful of errors if it's unnecessary.

CC.
  • 1,186
  • 1
  • 10
  • 22

1 Answers1

1

Those error rates are still really low from my experience. From the LTO ECC specification:

The ECC (error correction code) used by LTO-Ultrium is powerful enough to ensure reliable recovery of data even with the loss of one of eight tracks on a read operation and up to 1% of the bytes on the remaining tracks being corrupted

For the shops I've worked at we set a guideline about the number of times a tape would cycle through the library before we would purge it from rotation (usually this was at least 20 or 30 times). We also restored a sample of tapes once a quarter and verified md5sums on the data to make sure the entire backup system was functioning properly.

In addition to the error rates you are seeing there are a number of other variables some of which are more critical to tape longevity:

  • The environment the tapes are stored at for extended periods (arguably the most important)
  • The number of changes in an environment (perhaps as they are rotated out of the library and into storage)
  • The number of times the tape has been used (both reads and writes)
  • The age of the tape
  • The criticality of the data (it may make sense to do multiple backups of really critical things)

This is usually called Media Lifecycle Management and there are a number of companies that actually make enterprise software suites to deal with it. It may be worth investigating some of them to see if there are ideas that you would find useful in your shop. One example:

http://www.spectralogic.com/index.cfm?fuseaction=products.displayContent&CatID=1852

polynomial
  • 3,968
  • 13
  • 24
  • This is helpful. I've not been able to find any real-world guidelines on how many cycles before a tape should be purged. I'll search on Media Lifecycle Management so I have more to put in my budget wish list. :) – CC. Oct 05 '11 at 14:10