2

A normal functioning backup job for a long while suddenly began to fail around thankgiving intermittently and kept getting worse as time went on. Now it's failing daily. I think my tape drive may be damaged but I am not completely sure on that.

The backup software (ArcServe) always threw media errors and while my tapes were not brand new they have seen fifteen or less uses/erasings and were stored appropriately.

The tape drive (Quantum LTO2 Half Height) is on it's latest firmware revision and the scsi HBA and tape drive both are on the latest windows OS drivers. Those driver versions haven't been updated since the issues started because one of the first places I looked was at driver updates.

I use a tool called xTalk to run diagnostic tests against the tape drive and the "drive health" test says that it is okay. I run all the individual diagnostic tests and all of those complete too with the exception of the Full Tape Backup. The full tape backup fails out somewhere around 70-80% into the test. It throws a media error (The same media error ArcServe throws). My daily Arcserve backup also fails out in network share 6 out of 6 which given the backup set, is about 70-80% through the job. The pattern between 70-80% into the full tape backup test before fail and the 70-80% into writing a Arcserve job correlate all too well but I do not know if I can put any weight on what i'm seeing there. I'm inexperienced with these things.

It is important to note that in two months of testing here and there that the tape drive required cleaning before it would run my tests. Those dates were 12/16/11, 12/20/11, 1/9/11, and 1/23/11. On 1/9/11 I reflashed the tape drive with the same firmware it already was on and ran three back to back cleanings as recommended by Quantum. In my experience a functional drive does not ever ask for that many cleanings. My cleaning tape is old. It was at my company before I was. But it's not seen anywhere close to the 50 cleanings it says that it can do. I assume the tape still cleans okay because when the drive tells me I must clean it the tape goes in and the message disappears about cleaning it for a few days.

The thing that has me puzzled is that a short tape write and a medium length tape write are both perfectly fine. Something about the full tape, and near the end of the tape, is what freaks my drive out.

So, I ordered fresh tapes and am currently attempting a full tape backup on a fresh tape.

In my backup logs I see some network errors too. So I am not sure if I am fighting a issue where a backup share is lost over the network, or if I am battling bad hardware, or maybe both.

I was hoping someone who has been there before may be able to put some pieces together that i'm not seeing here. If my drive is damaged then I do not mind sending it in and paying to get it fixed. But when it only fails on one type of test and the device health test says the unit is fine it makes it difficult to justify throwing the tape drive in the mail.

Hopefully someone out there can assist. Thanks for reading my post and have a good day.

edit #1: I can restore from a tape that was incomplete due to a media error. I can merge a very old tape back into the database and restore files from it too. And even my Arcserve log shows me e6918 "your tape drive needs cleaning". So that's a couple more "normal" tape drive things that my tape drive can still do. Makes me think less that it's the tape drive being damaged but the persistent cleanings say otherwise I feel.

edit #2: 1/27/12 - I took a brand new LTO2 tape out of the shrinkwrap on 1/25/12 and ran a full tape backup test on it and had no problems. I took another fresh LTO2 tape on 1/25/12 and made a full arcserve backup and had no problems. This morning I received my new LTO2 cleaning tape and I ran it through three times. After the first cleaning I examined the window on the tape cleaner to discover it was not completely covered in grime so I ran it the other two times as a good measure. I then opened up my xtalk testing software and had no "clean tape drive now" warnings so I decided to take a previously erroneous tape and erase it and try a full tape backup. I've wrote the whole tape's contents and am in the process of verifying the write operations. If that comes back good then i'll be able to do something that last week would have failed during the write operation. So that may be a good sign that my issue is on the way to being resolved. Can't believe a little bit of dirt can affect so much.

TWood
  • 295
  • 6
  • 20
  • new tapes and a new cleaning tape fixed me up. I haven't had any issues out of the drive yet. – TWood Mar 04 '12 at 02:04

2 Answers2

6

First try replacing your cleaning tape (order a couple).
The "50 cleanings" figure for a cleaning tape is kind of like "every 10000 miles" for oil changes in a car: Under ideal conditions that's fine, but it's not a guarantee of performance. If you're using the cleaning tape on grungy old heads in a beat-up drive it won't last anywhere near that long: I've seen cleaning tapes come out brown after ONE cleaning cycle, and I certainly wouldn't reuse them for 49 more.
Your tape may look clean, but hold it up next to a shiny new tape and you may notice a surprising difference.

Also note that the "CLEAN ME" message going away just indicates that a cleaning cycle was done -- The fact that the cleaning message (and presumably the media errors) come back after "a few days" instead of months makes me suspect that your cleaning tape isn't doing the job (the drive is picking up errors, which triggers it to demand another cleaning).

If running a fresh cleaning tape and using new tapes for the backup doesn't make the problem go away you may need to have the drive serviced/replaced.


I don't think your network errors are related - a media error is typically thrown by the tape drive, and would be independent of any network problems. You may want to open a separate question about those errors (please be as specific as possible if you do: "network error" is one of those meaningless phrases that can be anything from "I couldn't open a connection to the backup client" to "I'm using ISCSI to talk to the tape drive and it ain't working!")

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • thanks for your comments. I'll go find a new LTO2 cleaning tape. One thing I forgot to add was the network errors I saw happened on a machine that had already done two successful sessions across the network right before it threw the error in the last session for the tape job. I wondered how it got 2 out of its 3 sessions done before it had an issue. On top of that it was a repeatable error always failing in the same spot. – TWood Jan 23 '12 at 22:53
  • @TWood without knowing what the "network errors" are all I can tell you is that [`it's dead and it was a cat`](http://absolutely100percent.blogspot.com/2007/01/solving-mystery-of-dead-cat.html) (and that it's *probably* not related to your media errors) -- You need to open a new question about that with more details (error message, and the fact that it's repeatable is significant too :) – voretaq7 Jan 23 '12 at 23:06
  • In my experience replacing the old tapes with new tapes doesn't help on this kind of problem. Rather it makes the problem worse because new tapes come with a bit of debris left over from the manufacturing process. This will wear down the tape heads and cause them to need cleaning even more frequently. – kasperd Nov 10 '14 at 11:42
  • @kasperd Well certainly you shouldn't be running a new cleaning tape through the system every cycle, but the other extreme (continuing to use a cleaning tape that's black with magnetic residue it's scrubbed off the head) is certainly ineffective - scrubbing off one layer of magnetic dust to deposit another isn't "cleaning". If you really want to treat your drives well you'll manually clean the head with a foam swab and acetone or ether, but I don't know anyone who's willing to do that regularly. – voretaq7 Nov 10 '14 at 16:22
0

This tape drive ultimately failed again on me starting in May. It was determined to be a damaged unit.

We swapped over to external usb drive backups instead of tapes. Much cheaper overall equipment costs.

Unfortunately for me my question regarding reliable backup software was deemed closable and I had to write a super batch to handle backups on the new drives. It's not ideal but it works.

TWood
  • 295
  • 6
  • 20