8

We have an LTO-3 Tape drive in a Dell media library that we use for our tape backups. The article about LTO on Wikipedia states that:

LTO uses an automatic verify-after-write technology to immediately check the data as it is being written, but some backup systems explicitly perform a completely separate tape reading operation to verify the tape was written correctly. This separate verify operation doubles the number of end-to-end passes for each scheduled backup, and reduces the tape life by half.

What I would like to know is, do I need my backup software (Backup Exec in this case) to perform a verify on these tapes or is the verify-after-write technology inherent in LTO drives sufficient?

I would also be curious if Backup Exec understands the verify-after-write technology enough to alert me if that technology couldn't veryify the data or will it just ignore it making it useless anyway since even if the drive detecs a problem I would never know about it.

Chris Magnuson
  • 3,701
  • 9
  • 40
  • 45

2 Answers2

10

Great question!

Whilst I would say that yes you should test them, I'd say that testing the tapes/drives in themselves is important what is much more vital is testing the end to end restoration process.

I can't recommend enough regular full system restorations and service testing, it's the only way to know for sure that the entire system is doing what you bought it for. You don't have to look far on this site to see people who struggle to restore their service even though they thought they'd covered all the steps individually.

Hope this helps.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • 1
    +1 for restore. Its at this point i chime in with "virtual machines!" - restoring to a virtual machine can be a good non-disruptive indicator that your backup is useful – Sirex Feb 18 '11 at 15:34
  • 1
    What your saying is definitely a good idea and we should start doing that but I am still not sure whether the verify-on-write technology in LTO 3 drives will cause my backup to fail if for some reason the data cannot be verified just as would occur if my backup software ran a verify on the data. I like the idea of doing something more but I still need to know if I am currently doing something redundant that is not necessary. Thanks – Chris Magnuson Feb 18 '11 at 17:10
  • 1
    Also make sure that you do a restore using a different tape drive than the backup was made with as some tapes can only be read on the drive that made them ( or at least this was possible in the past ). – James Nov 08 '11 at 22:02
  • @ChrisMagnuson Did you ever find out what happens if the drive detects an error using its own verify-after-write scheme? – alx9r Aug 26 '14 at 22:28
  • 1
    @alx9r I am afraid not. It looks like no one knows for sure and I am not sure how you can intentionally mess up the data immediately after it was written by the tape head but before it is read by whatever mechanism does the verify-after-write so that you can see how your backup software responds to there being an error detected. – Chris Magnuson Aug 27 '14 at 16:23
1

First of all this automatic verification is no substitute for end-to-end verification. I have seen drives shipped with a firmware bug that caused restore reading to be less reliable than verification reading.

The outcome of that was that you could write the tapes without any errors being reported, but upon trying to restore you would see reads getting errors or dropping in speed by several orders of magnitudes.

Most customers never noticed this firmware bug. According to the vendor because the customers didn't actually perform test restores. This particular bug got fixed. But I'm sure we haven't seen the last firmware bug, and some firmware bugs will only be discovered if you actually test real reads.

What happens when the verification fails is that the firmware automatically writes a second copy (and during restore the firmware transparently to the host returns only one of the two copies). This means that available capacity varies depending on drive health and media quality.

If too many write attempts fail in the verification read an error is reported back at the SCSI level. One would think an error reported this way is hard to miss at the software layer, but bugs in code paths that are only triggered by flaky hardware are notoriously difficult to test for.

kasperd
  • 29,894
  • 16
  • 72
  • 122