What kind of periodic maintenance should I carry out on HDD backup?

15

3

I have a bunch of directories backed up on an external HDD - not an SSD, a magnetic disk.

The backup is just onto a single disk (yes, I know, an extra HDD with a copy would be a good idea; but that's not the case right now). Also, there are no dual copies of the files on the disk.

The HDD has (option 1) much more space than my files take up (option 2) less free space than my files take up (but still a nice amount)

I keep the disk in its original packaging: Plastic bag, within an "egg-carton" like wrapping, within the plastic box. It's kept in a room in my house with the box never exposed to the sun, nor to rain etc.

My question is: Is there something I should do with the disk, periodically, to maximize data longevity? e.g. read everything to someplace else, or read-and-write-back, or reshuffle the physical positions of data on the disk somehow, or even shaking the disk, changing its physical position, powering it on without doing anything, etc. etc. I'd like an answer for both optional scenarios I've described.

Notes:

  • I'd rather not make this question specific to a single brand of HDDs, but if you must know - it's a Toshiba STOR.E basics 750 GB drive. Not my choice, I just need to work with this.
  • The HDD's manual doesn't say anything about this issue.
  • The backup represents the state of these folders sometime in the past. Assume it's important to keep this state as-is, and that there's no "master copy" of the same data.
  • While it is probably irrelevant to the question, it is not catastrophically bad if these files are lost, I'd just want to increase the expected longevity.
  • even if I had two copies, on two HDDs, the question would be just as relevant: What kind of maintenance operations should I do on each of them separately?

einpoklum

Posted 2017-01-22T22:31:49.690

Reputation: 5 032

Periodically reading the files in order to "improve" the magnetic field that represents the data would be a good idea. Using a filesystem that adds some automation/additional protection to this would make it a bit easier. Obviously having multiple media devices with the same data would be preferable. The manual doesn't say anything about it as it's an uncommon scenario to use HDDs as offline storage (I guess). – Seth – 2017-01-24T09:02:29.090

4The answers seem to be based on opinion and common practice. I'm not seeing solid, research-based answers or authoritative citations. – fixer1234 – 2017-01-25T17:20:14.493

Asked and answered, here on SU although it's a lot more involved than your Q. Read the accepted answer to this Q. All the data/sources/citations you could hope for! You can extract what works for you from his data.

– user686699 – 2017-01-29T09:00:10.117

Answers

5

From a professional point of view, your options are:

  1. Pray.
  2. Make multiple copies, on multiple devices.

In your "option 1" (much more space) you could very marginally increase your odds by making multiple copies on the same hardware, but the fact is that hardware fails, not infrequently rendering the whole disk unreadable. A single copy is not a viable backup strategy.

I'm unclear if this is an actual backup (of files on a primary device) or an archive (of files removed from the primary device.) The extra copy is somewhat more important if you care at all about the archive case - in the backup case there is in theory a primary copy so you have to have at least two failures before you are totally out of luck.

Ecnerwal

Posted 2017-01-22T22:31:49.690

Reputation: 5 046

1While your recommendations are valid and appreciated, that's not what I asked. You do seem to be implying that all of the actions I suggested are meaningless/useless in terms of longevity. Is that what you're saying? – einpoklum – 2017-01-22T22:54:15.300

1Sorry, but prayer is not really from a "professional" point of view. – oldmud0 – 2017-01-25T01:20:53.140

2If you are a good engineer, you'll pray to Murphy, and your offerings will consist of more homes for this data, because anything that can go wrong, will go wrong. Other deities and offerings may have less satisfactory results... – Ecnerwal – 2017-01-25T03:54:11.887

1Tbh, with a single copy, seeking divine favor isn't the worst idea. – Journeyman Geek – 2017-01-25T07:50:51.060

5

If you have more free space than the backup data uses - your option 1 in the question - or if you have multiple copies of the data, I've got an idea that would "do something"; if you think SpinRite really helps with hard drive "maintenance" and/or want to completely overwrite and then re-write every bit of your data, this would do it.

Whether you should do something or not, I'm not too sure... bit-rot or Data Degradation seems to really exist, and questions like this one here on superuser and this one on serverfault seem to advise backups or maybe an error-correcting or fault-tolerant RAID (but for only a single hard drive I'd pick multiple backups & hash/CRC checks & not worry about what to do if a RAID fails).

I'm leaning towards the more simple and lazy "do-nothing" approach, but the following is at least a good "make sure I can still read my data once a year, and might as well re-write it too" idea.

Linux DIY Emulation of some SpinRite maintenance features

Lots of people seem convinced that SpinRite really works, but it's not free and I run Linux, so I've listened to Steve Gibson's HOW does SpinRite work? video and he says that one of the things SpinRite does now is:

  • Reads the entire drive
  • Flips the bits & writes them
  • Reads them again
  • Flips the bits back & writes them
  • Reads them again

If the drive finds any (minor) problems, this should "induce the drive itself to swap the bad sectors with a good ones."

How often should you do this? Steve says "no one really know how often that is, but every few months should be often enough". I'm just guessing every 6 months or every year or so.

badblocks

The reading/flipping/reading/flipping process sounds nearly identical to what badblocks does when it uses it's write-mode testing (-w option) only it doesn't really "bit-flip" your data, but does destructively write, read & flip all the bits on the partition:

With this option, badblocks scans for bad blocks by writing some patterns (0xaa, 0x55, 0xff, 0x00) on every block of the device, reading every block and comparing the contents.

Not coincidentally, those patterns are, in binary: 10101010, 01010101, 11111111, 00000000.

So badblocks writes, reads & flips bits pretty thoroughly, and it's free too. If you have mke2fs run badblocks (with badblocks -cc) it'll save the list of badblocks so ext2/3/4 will avoid them, if any were found.

The downside is badblocks' write testing is destructive, so you'll need at least two partitions for this to work (to save & write back your data).

  • Keep two copies of your data on the hard drive, each on DIFFERENT PARTITIONS!.
    This lets you overwrite every bit on a single partition with 10, 01, 11, 00 doubles your recovery chances if bad areas develop. And keep a list of checksums/hashes for your data files, like CRC32 or MD5 (though MD5/SHA's are very slow compared to CRC, and random errors shouldn't be missed by CRC)
  • Every few months:
    1. Read your backup copies & verify it still matches the checksums/hashes.
    2. "Pseudo"-bit-flip a partition with badblocks -w or mke2fs -cc (Only ONE partition, do not overwrite all your data, just one copy!)
    3. Copy your data back onto the freshly flipped partition
    4. "Pseudo"-bit-flip the other partition (one that hasn't been flipped yet)
    5. Copy your data back onto that freshly flipped partition

This is similar to just reformatting & copying your data back, but a quick/standard format won't usually write to every sector, so you may end up not changing/flipping many of the bits


The best solution is always multiple copies on multiple devices.
I've read that optical media could be readable for 10, 20, maybe even 50+ years, and two identical disks/ISO's would fit with gddrescue (below).
Cloud storage is often free for a few GB's, storing files there (optionally encrypted) may be a good idea, especially if the amounts keep going up.

Also, saving your files in an error-correcting archive may help if any errors do turn up, but losing one file out of a million may not be as bad as losing a whole archive of a million files. If any separate error-correcting software existed, like an ECC-CRC, that could help, but I don't know of any, and an extra copy of the data would be even better.


Tangentially related, SpinRite also "tries very hard" to read data from a bad sector of a hard drive, reading from different directions & velocities, which also sounds very similar to gddrescue, in case (or when) you do run into trouble reading your data. gddrescue can also read from two copies of data with errors and hopefully piece together one full good copy, and I'm tempted to make two (or more) identical copies of your data partition with dd, but then if badblocks does find any bad sectors you couldn't avoid them since it would change the identical copies.

Xen2050

Posted 2017-01-22T22:31:49.690

Reputation: 12 097

Can you link to an explanation of exactly how you bit-flig a partition with badblocks or mke2fs? – einpoklum – 2017-01-25T20:44:31.737

Bit-flipping will not fix the sector-address which lies outside the sector. I know that SpinRite cleverly uses some properties of the disk-controller in a rather surprising way, not easily derived from the specs, which he is still keeping secret. The guys behind HDD Regenerator may have worked it out, but it's not public knowledge. – harrymc – 2017-01-25T22:21:30.373

Is there any proof bit flipping does any good? Sounds like trading disk wear for solving a problem that I've never seen any reference of ever, anywhere, in any proper, trustworthy source. A citation would be very educational. – Journeyman Geek – 2017-01-26T00:24:12.980

@einpoklum I've updated the answer some. I don't have a link other than the man page for badblocks to overwrite every bit, then write your data back.

– Xen2050 – 2017-01-30T12:46:03.840

@JourneymanGeek I was only going by what Steve Gibson says in the linked video on his site, essentially "from the horse's mouth." But unfortunately I don't have & couldn't find any other references, at least from a quick search. Actually I get the impression from the other SE questions that bit-rot may not be much to worry about, and just re-writing alone, even the same bits in the same place, may cause the "magnetic domains in the physical disk surface [to] be renewed with their original strength" – Xen2050 – 2017-01-30T13:07:56.257

5

Since it seems to have been missed by most posters here, this is my recommended answer to the specifics of your question, using this excellent post, What medium should be used for long term, high volume, data storage (archival)? as the guide. I'll not re-cite the references and research from there, as he did an excellent job, and reading the whole post is better than the summary for this case.

Limiting yourself to one HDD in cold storage (offline), with the two options given you should connect the drive every couple years, or thereabout, and spin it up. The biggest reason for doing so is to keep the spindle grease from hardening and seizing. The spindle grease will harden over time, and spinning the disk once in a while can significantly delay that eventuality. If you want to get some insight into the importance of the grease to a HDD look at the amount of effort Minebea, a HDD motor manufacturer puts into their research about it in this report.

While the disk is connected, you may as well run some SMART diagnostics to look for signs of impending failure of either the electronics, the hardware or the platter. Although, from the research presented at FAST'07 by Google and Carnegie Mellon University {winning 'Best Paper' that year}, the SMART test can be indicative of failure, but a 'passing' test may not be indicative of good health. Nevertheless, checking won't hurt. Yes, it is old research, but nobody seems to have replaced it with anything newer.

Having the drive running for a while, and accessing the data will also renew the strength of the magnetic fields holding the data. Some an argue that it is not necessary based on hordes of anecdotal evidence, but what research there is seems to indicate that the weakening of the magnetic fields is possible. I present three papers from the University of Wisconsin-Madison: Parity Pollution, Data Corruption, and Disk-Pointer Corruption. After reading these you can decide how much their conclusions threatens your data, and how much effort it is worth to protect against it.

Suggested curation routine

I don't know what OS you use, what tools you have or prefer, nor what file system you choose. Therefore my suggestions will be generic only, allowing you to choose the tools that best fit your configuration and preferences.

First is the setup for storage. Before saving the files to the HDD create archives of them. This doesn't imply compression, nor does it avoid it. Choose an archive format that will give you error recovery or 'self-healing' abilities. Don't create one massive archive, rather archive things that belong together, creating a library of archives. If you choose compression, then be sure that it doesn't interfere with the error recovery ability. For most music, video, movie, and picture formats there is no point in doing compression. Such file formats are already compressed, and trying to compress them rarely gains space, sometimes creating larger files, and wastes your time and CPU power in the bargain. Still, archive them for the error recovery above. Then create a check-sum for each archive file, using the digest algorithm of your choice. Security isn't the issue here, merely a sanity check for the file, so MD5 should suffice, but anything will work. Save a copy of the check-sums with the archive files, and in a second place on the same HDD - perhaps a dedicated directory for the total collection of check-sums. All this is saved to the disk. Next, and quite important, is to also save on that HDD the tools you used to create the check-sums and to restore the archives (and to uncompress them as well, if you used compression). Depending on your system this could be the programs themselves, or it might need to be the installers for them. Now you can store the HDD how you choose.

Second is the storage. Current HDDs are reasonably protected from physical shock (shaking and bouncing shock), but there's no point in pushing it either. Store it pretty much the way you have mentioned in your question. I would add in to try avoiding areas where it is likely to be subject to electro-magnetic forces. Not in the same closed as you circuit breaker panel or above your HAM radio, for example. Lightning miles away is something you can't avoid, but the vacuum cleaner and power say are avoidable. If you want to get extreme, get a Faraday shield or Faraday bag for it. Of you suggestions two are either pointless, or bad. Changing its physical position while it's stored will not affect anything that matters, and shaking it could cause damage, shouldn't as most drives have good G-shock protection, but it is possible.

Last is the periodic measures. On a schedule you choose, annually or bi-annually, for example, remove it from storage and reconnect it to the computer. Run the SMART test, and actually read the results. Be prepared to replace the disk when SMART results show you should, not "next time," but "this time." While it's connected check all the archive files against their check-sums. If any fail the check, try to use the archive format's error recovery abilities to restore that file, recreate the archive, and its check-sum and resave it. Since you also gave option 2 as having a "nice amount" of free space, copy the archives to new directories and then delete the originals. Simply "moving" them may not move them at all. On many newer file systems moving the file will change what directory it is listed in, but the file contents will stay where they are. By copying the file you force it to be written somewhere else, then you can free up the space by deleting the original. If you have many archive files, none are likely to be so large as to fill the free space on the HDD. After you have verified or restored all the files, and moved any you choose to, restore you packaging and put it back in storage until next time.

Extra things to pay attention to. When you upgrade your system or, worse, switch to a different OS, make sure you still have the ability to read that HDD in the new configuration. If you have anything that is not plain text, make sure that you don't loose the ability to read the file as saved. For example: MS-Word documents can have equations created in one format, newer versions cannot read those. See this for that very problem. Word isn't the only possible source of trouble, however, and not even Open Source formats guarantee that your data is future-proof. For a major blunder in this realm read about the failed Digital Domesday Book project. As new technologies appear, consider updating your collection as well. If you have movies saved as AVI files, and you like MKV better, convert them. If you have word processing documents and upgrade your program, resave the archived ones in the new format.

user686699

Posted 2017-01-22T22:31:49.690

Reputation: 1 703

4

Magnetic media may fade over time and the result is a bad bit or sector. One solution may be to renew the magnetic part once every few years.

The simplest way is to copy and rewrite the entire hard disk, although this may not renew the sector-address, which is the "header" of the sector that allows the firmware to position the head to it. Renewing the sector-address may require the re-format of the disk (deep format - not quick).

An alternative solution is to use disk regenerate products. These products scan the disk at the physical level, reading every sector and its address and rewriting both to renew the magnetic data.

The additional bonus is that in case of a read error, these products will try multiple read methods in order to save the data, will mark the sector as bad and will remap it to a spare sector (most hard disks have spare sectors) so the data is saved.

Here are a few such products :

  • DiskFresh (free for private and non-commercial use or $25) - Part of the Puran Utilities which get good reviews. It only informs you if there are any damaged/bad sectors and does not do advanced recovery.

  • SpinRite ($89 with money back guarantee) - This was not updated for quite a few years, although it still saved my disk a few years ago. I would not trust the money back guarantee as the product is quite old.

  • HDD Regenerator ($89.99 with money back guarantee) - A newer product with good reviews.

For completeness-sake for readers looking for safe long-term storage, I would remark that "write-once read-forever" DVD and Blu-Ray products exist, commercially branded as M-DISC or Archival Disc.

harrymc

Posted 2017-01-22T22:31:49.690

Reputation: 306 093

I know SpinRite uses its own bootable medium; what about the other ones you mentioned? Are they Windows-based? Linux-based? Own-bootable-based? – einpoklum – 2017-01-25T11:31:43.713

1DiskFresh runs in Windows and HDD Regenerator does both Windows and bootable flash disk. – harrymc – 2017-01-25T11:35:00.457

So perhaps I should ask if doing the equivalent of that on Linux requires a separate bootable, or whether you can just make do with /dev/sdX device files and dd or something similar. – einpoklum – 2017-01-25T11:43:28.197

1@einpoklum: Any product(s) under any operating system that do deep formatting and disk imaging and rewrite will do the job, including dd for the rewrite part. The point is to entirely renew all sectors of the disk, sectors used for both file data and metadata. While copying the disk an alternative temporary storage is required, but today that is cheap. – harrymc – 2017-01-25T11:48:02.683

@einpoklum: One can use DiskFresh for the maintenance and only use the more advanced utilities to recover from errors (success not guaranteed). DiskFresh should be enough under correct storage conditions. – harrymc – 2017-01-25T13:21:35.280

Have you used any of these tools, and actually seen an effect? I've generally been told these are snake oil, and essentially you're probably increasing mechanical wear, which is likely more of an issue. Standard disk checks handle sector remapping and recovery. – Journeyman Geek – 2017-01-26T00:21:58.757

@JourneymanGeek: If you would have read this answer you would have known that I have used SpinRite a few times and it always saved and resuscitated my disk. A hard disk that cannot stand having each sector written once every few years was in a really bad shape to start with and certainly not suitable for archiving. – harrymc – 2017-01-26T06:31:58.907

I've read what its supposed to do, and I'm pretty certain its snake oil. Not to mention, it doesn't support modern hard disks , and the 'physics' behind it makes no sense. – Journeyman Geek – 2017-01-26T06:35:18.480

@JourneymanGeek: I have the experience of a laptop that became unbootable after 3 years of daily use. It took several hours, but SpinRite made it as good as new and it lasted for 3 more years of daily use until its owner decided to invest in a new one, but not because of any problem with the disk. I had other cases, but this was the most extreme. SpinRite magic is not for all disk controllers, but this is not a problem for archiving purposes. HDD Regenerator, which I have never used, does not list that requirement. There is no such limitation for DiskFresh which works thru Windows. – harrymc – 2017-01-26T08:20:58.563

3

I've always felt the trick is to assume your drive will fail. There's some modes of failure that are random. For non random failures - there's two aspects here - the drive and the filesystem.

While its a bit of an unusual source - this reddit thread suggests that one given bit may flip in 10 years or so, though I suspect a single flipped bit would be silently handled by ECC - either in the filesystem or on the drive itself.

You can typically find age related 'large scale' issues with periodic SMART tests - looking at things like pending reallocated sectors. With the relatively short duty cycles, you shouldn't really see much, but we're being a little paranoid here. Once again, until things get really bad your drive will likely silently handle this in ECC.

Finally there's a risk of sudden drive or controller death. In theory, you can baby the drive, by running it in controlled, cool temperatures, which are known to maximize drive life, but I've never really babied by drives.

Drives are supposed to have a certain number of spin ups and spin downs (a non issue here), and I suspect properly ejecting the drive would allow data to be flushed to the drive, and there's tools to power down drives. I believe hdparm would do that, but I need a little more testing.

Finally, I pick drives that are known to last. I also rotate external drives every few years, moving older drives down the hierarchy.

In theory file systems like ReFS and zfs are designed to reduce the risk of data loss though integral data checksums. At the very least, you won't have files getting corrupted silently. Picking them over more common file systems, would likely reduce the chance of data loss, but there's no 'easy' way to deploy them yet on a desktop OS. ZFS has somewhat decent support on linux and none on windows, and ReFS hasn't made its way down to windows desktop yet. These are designed around either having multiple copies on one or more drives for actual recovery as well, so wouldn't exactly work here.

Journeyman Geek

Posted 2017-01-22T22:31:49.690

Reputation: 119 122

2Technically, checksums on ZFS (and maybe ReFS) don't do anything to reduce data loss on their own, only verify data integrity. You'll still need a parity or mirror drive (i.e. some kind of redundancy) to recover from any errors/corruption. I don't believe there are any (popular) filesystems that can recover on their own with a single drive (and if any exist then they'll have to sacrifice drive space to do so). – Bob – 2017-01-24T23:58:40.560

Updated to reflect that. Didn't bother going much further cause these filesystems wouldn't really work in his usecase/ – Journeyman Geek – 2017-01-25T00:05:07.043

Yea, the only benefit to the checksums in this kind of scenario is that you at least know which files/drives to not trust. – Bob – 2017-01-25T01:33:00.537

Why would sectors be reallocated when the HDD is just sitting there? I mean, you could suggest that I check the SMART stats after copying the entire disk contents to someplace temporary (or to /dev/null maybe?) , and this would trigger errors and reallocations. – einpoklum – 2017-01-25T09:02:01.677

That's a good question - that shouldn't happen at all under normal circumstances, especially with fairly minimal duty cycles. However short of sudden and unexpected, and very terminal death of your hard drive, its unlikely anything will actually happen to your drive. Most of the failure modes I can think of tend to be unexpected. – Journeyman Geek – 2017-01-25T10:57:08.973

Eh, While there's anecdotal evidence it works - everyone who does it does it as a last ditch thing. You're either shrinking the bearings or the platters, depending on who you ask, and while it works once or twice, its something you do in very specific circumstances, when all else fails. I don't consider it something I'd do as preventive maintenance, and there's no magic way to stop bearing failure. – Journeyman Geek – 2017-01-26T00:18:29.090

3

No maintenance should be performed whatsoever. Reconnecting the drive and powering it up represents a higher risk than having it operating continuously and way higher than letting it sleep in a box. So checking it very often would actually increase the damage probability.

How you store it is excellent, but don't forget about temperature. Don't let it be extreme. What exactly do you use as a backup drive ? Some are way more durable than others.

A thing you can do, since you do have enough space as you stated, make two copies of the same data on the HDD. In case of bad sectors, you will be fine. From what I noticed, most of the drives today take sector damage at the beginning of the drive (first few GB) but that's due mostly to the operating system (not your case). Generally, bad sectors will appear initially in most cases clustered together, so having two copies of the data on the same drive does help.

If you have just a few critical files it would be a good practice to save them somewhere else too, just to be safe. Make an encrypted archive and put it on a stick or give it to someone you trust.

Overmind

Posted 2017-01-22T22:31:49.690

Reputation: 8 562

1The 'notes' part lists what specific HDD is used (Toshiba STOR.E basics 750 GB). Also, can you link to some kind of reference for the claim that powering up and connecting the drives decreases the estimated time to failure? Not that there's no reason to the claim, it's just that other people are suggesting the opposite essentially. – einpoklum – 2017-01-25T11:46:12.940

Is there anything you can cite as the basis for the assertion that no maintenance should be performed and that powering it up is a higher risk? – fixer1234 – 2017-01-25T16:31:20.337

I don't agree with the argument for no maintenance, since even a disk that is left unpowered may still go bad, and being unpowered you will never detect it. – harrymc – 2017-01-25T22:28:34.957

Statistically, it's way more probable to get damaged when you power it on for a check compared to not accessing it at all. – Overmind – 2017-01-26T08:13:03.883

I haven't seen any such statistics, and even if such exist, they certainly do not apply when powered on once for several hours once every few years. It would take thousands of power-ons to cause measurable damage to a disk. – harrymc – 2017-01-26T08:31:29.373

Any electronic device has the highest probability to break at power-on. It's a well known fact since the CRT TV era and still valid today. – Overmind – 2017-01-26T08:51:37.273

@Overmind: Maybe, but only after several years of use. – harrymc – 2017-01-27T15:37:21.337

"Reconnecting the drive and powering it up represents a higher risk than having it operating continuously" - while it sounds reasonably, my experience doesn't confirm this. Backblaze estimates that after 6 years of service half of HDDs running 24/7 would die. I've dealt with quite a few 6+ years old HDDs that were used in regular PCs and mortality rates among them were much lower than that. I'd expect offline storage for ~10 years to be fine, bit rot is more of an issue. Ofc my experience isn't a representative statistical sample, so take it with a grain of salt. – gronostaj – 2017-01-30T19:00:54.013

Well, I have working WD 4.3GB drives still online. If you want a test to be relevant, get the same number of drives from the same lot and test them half with daily cycle and half on continuously while respecting the environment operating T. I have performed such tests with identical drives (800 in total, known bad seagate lot used in RAID 1s) from which 640 were power-cycled nearly daily and 160 on continuously. After 3 y, out of the cont. 160, 147 were alive. Out of the 640 PwC'ed, only 467 were still alive and some had bad sectors or reallocated (81 with BS or RA) and 173 died. Ratio: 3+:1. – Overmind – 2017-01-31T06:21:31.320

In the case of WD 1TB blacks, 48 were used in RAID configs. After 4 y, node died. In the case of power cycled ones, after 4y, out of 102, 3 died, which is less relevant as statistic due to the high quality and low death rate of the drives. In this case, there was a pretty good chance of random at the failed ones. The 48 RAID ones are still in use after 6 years. No 6-y statistic on the others. – Overmind – 2017-01-31T06:26:40.760

2

As we see from the recommendations of others, a single backup resource is not a reliable solution IF the backup is of any value. Experience with electronic devices has taught many of us (the hard way) that it isn't a matter of IF but of WHEN a backup device will fail.

Hard drives, by design, are for relatively short term data storage. Two excellent articles, https://serverfault.com/questions/51851/does-an-unplugged-hard-drive-used-for-data-archival-deteriorate and How much time until an unused hard drive loses its data? discuss the lifespan of data stored on a hard disk drive. As always, your mileage may vary.

The backup solution you describe is better than no backup at all but you still have a single point of failure. With your backup on a single device, you risk losing the ONLY copy of your data to fire, flood, theft, explosion, device failure, etc. So the question is: Are your efforts to preserve your backup a worthy expense of your time?

To accomplish your goal, i.e., a backup you can rely on, more than one backup is required. If you're going to store your data on a hard disk, your backup requires occasional "refreshing" to counteract the long term storage data degradation inherent with hard disk drives. If I were wearing your shoes, I would purchase a second backup drive similar to the original and once a year copy the data from the primary drive to the secondary drive. At the end of each year reverse the process and copy the data from the secondary drive back to the primary drive. Rinse and repeat each year. One of the drives should remain offsite, remote from your location to avoid losing your only data copy to a natural disaster.

John Littleton

Posted 2017-01-22T22:31:49.690

Reputation: 51

While everything you say is true, only the last paragraph answers the question. – einpoklum – 2017-01-25T09:03:05.860

Is there anything you can cite as the basis for the benefit of refreshing, and the one-year time frame? – fixer1234 – 2017-01-25T16:14:54.783

1

I couldn't find any credible, scientifically backed data on this. Generally speaking, there are two aspects of this issue:

  1. Bit rot: various physical effects can flip bits stored in magnetic domains stored on HDD's platters, thus damaging data on the HDD. (disk is still fully functional)
  2. Mechanical issues: powering the drive on/off, keeping platters spinning or stationary, storage conditions and natural aging can make the drive unusable after some time. (data may still be intact and recoverable)

Bit rot is discussed in this thread from 2008. User arnaudk wrote:

From what I can ascertain, it looks like it would take about 22 years (details below) for you to loose your data due to thermally-driven demagnetization if the hard drive were just sitting motionless at room temperature in a dark corner. In reality, this time will be a bit shorter because of mechanical vibrations and external magnetic fields arising due to everything from the motor of the hard drive itself to lightening storms 50km away.

Acceptable levels of signal decay vary depending on system design but typically range between 10-20% [ref4], so it would take (-1/326000)*ln(0.8) = about 22 years for an entire bit domain to get 20% weaker causing possible loss of data due solely to thermal demagnetization effects.

(direct link to post)

That's the only estimate I could find. If that's correct, then you could safely rewrite entire drive every 5 years to "refresh" data.

Mechanical issues are even more of a mystery. Backblaze is a company that uses thousands of consumer-grade hard disks in their datacenter and regularly posts updates on their well-being. According to their estimates after 4 years of spinning 24/7 20% of hard drives died and if the trend continues, after 6 years half of them will be gone. This is more or less in line with figures from this Google whitepaper. However, that's not a standard use case for a hard disk and we can hardly compare it to a drive sitting offline in a box. I'm not aware of any studies that tackled this case.

All in all, if you really care about that data, you should keep two copies of it and move it to a new, stress-tested HDD every 5 years or so. That should keep magnetic domains and hardware reasonably fresh, but YMMV.

gronostaj

Posted 2017-01-22T22:31:49.690

Reputation: 33 047

0

Increasing the life of a hard disk drive is one of the matters in which you get the best result from doing the least. Unwrap it, place it on a solid platform away from intense heat, humidity, dust or radiation, where there is enough air circulation and the least likelihood of a kid smashing it by accident. You can expect a long life from your hard disk until such time comes that you upgrade it.

Perhaps it is hard to accept that as a consumer, there is very little (even nothing) you can do to increase the hard disk longevity. But there certainly are ways to improve the survival chance of your data: ReFS, RAID and backup.

Believe me, the industry is working on improving the longevity of data itself as opposed to hard disks.

user477799

Posted 2017-01-22T22:31:49.690

Reputation:

Protecting it from damage is good advice. Is there anything you can cite to support the assertion that there is little that will increase longevity? – fixer1234 – 2017-01-25T16:08:30.247

-1

In my experience frequent switching from start/stop (idle/run) is bad for HDD's, it better to keep them always spinning if you Ok that it will draw more electricity. (Tested it on multiple systems with the same HDDs from the same store, where some HDDs forced always spinning and another ones not)

On all servers we running on regular basis ones per day "short" SMART test and over weekend "long test" that at least may give an idea when HDD is going to fail. If you use ZFS, then doing regular "scrubbing" once per month for enterprise versions of HDD is enough and once per 2 week for consumer grade HDD.

A good, decent power supply also is one of the factors for healthy HDD, plus UPS that prevent random electricity surprises bypass to HDD. (External HDDs get power from computer so it applies to them too)

Vibration/shaking while HDD is running isn't good for them too. (Especially important for portable HDD - not to move them while it working )

Also, choosing right model for particular HDD's jobs (surveillance, NAS, desktop...) is a way to extend their live

Alex

Posted 2017-01-22T22:31:49.690

Reputation: 5 606

1I don't think he runs it in the box... – Journeyman Geek – 2017-01-25T07:51:28.940

I don't actually run it... it's just backup. Of course I wouldn't run it from within the box :-) For that reason the advice about a UPS or vibrations etc. is irrelevant to my case (it would be relevant to a disk that's in constant/frequent use. – einpoklum – 2017-01-25T08:57:25.613

I wouldn't say that UPS is irrelevant. Even if it's external backup drive it still need to be powered and in case if something happened on power lines something should guard such situations. – Alex – 2017-01-25T16:55:23.010

Those who down-voting, please leave a reason in the comments. It's interesting topic, so I would like to hear what's wrong. Your opinion will benefit everybody. – Alex – 2017-01-31T22:43:23.137

-3

Generally speaking, if it is a Linux system than no maintenance is ever needed. Windows systems seem to loose clusters much more often than Linux. For that reason, a chkdsk every 3-6mo is wise on Windows system.

All hard drive parts with bushings and bearings eventually have some misalignment from wear after 5 or more years of constant use. The best way I have found to not wake up some day with a corrupted partition is to re-format at least every 5 years.

Generally I have something that requires a major overhaul of my system every couple years and so re-format at that time (be sure to use a full re-format with error checking). My memory is general good enough to note a decline in hard drive space after formatting; this is an indication the drive is failing. If a person is not familiar with their system, they could keep records of the exact byte count after formatting.

At some point the "extra" sectors will be used (specifically for this purpose) and the system will start marking "normal" areas on the drive as unusable - the byte count will decline. At this point the drive should be scraped - there will probably already be data loss. This is normal for a hard drive that is kept on 24/7 in 5-10 years.

The only way to extend longevity of the drive is to set the system to power it down after a few minutes of inactivity. I have a 2tb drive I use as a master backup and have it set to power down after 10min of non-use. I may go 30 days without accessing it and so it remains turned off. It takes it 20sec to power up and become readable if it is needed.


So if the discussion is limited to shelf life, not ever being powered up periodically; then there are the well covered environmental concerns covered in the link above "How much time until an unused hard drive loses its data? " The only issue I did not see mentioned in that discussion about un-powered electronics is capacitor shelf life. They last longer by periodic use; otherwise they dry out; this is the electro-chemical structure of a capacitor (and batteries).

The rule of thumb for capacitor life is 20yrs. This is called the 20/20 rule. Capacitor failure will be highest in the first 20 minutes of use, then statistical failure will again be exceeded after 20 years of use. But they fail much sooner than 20 years if not used.

The most common (generally speaking) failure in electronic components is the capacitors. Capacitors (electro-chemical), then inductors & transformers (elctro-mechanical) wear out whether being used or not.


A company called Backblaze has collected data on hard drive failures. It has released that data in company blogs, highlighting which manufacturer's drives failed more often than others.

In a recent blog it published data indicating exactly which 5 SMART attributes indicate imminent drive failure:

From experience, the following 5 SMART metrics indicate impending disk drive failure:

    SMART 5 – Reallocated_Sector_Count.
    SMART 187 – Reported_Uncorrectable_Errors.
    SMART 188 – Command_Timeout.
    SMART 197 – Current_Pending_Sector_Count.
    SMART 198 – Offline_Uncorrectable.

You can chose a subset such as these suggested 5 stats because they are consistent across manufacturers and they are good predictors of failure.

The article goes on to suggest:

SMART 5: Reallocated_Sector_Count 1-4 keep an eye on it, more than 4 replace

SMART 187: Reported_Uncorrect 1 or more replace

SMART 188: Command_Timeout 1-13 keep an eye on it, more than 13 replace

SMART 197: Current_Pending_Sector_Count 1 or more replace

SMART 198: Offline_Uncorrectable 1 or more replace

jwzumwalt

Posted 2017-01-22T22:31:49.690

Reputation: 268

4> Windows systems seem to loose clusters much more often than Linux. For that reason, a chkdsk every 3-6mo is wise on Windows system. [citation needed] -- I have not heard such advice before. Not since 2007, anyway. And it's rather orthogonal to the question, which mostly asks about the hardware in powered-off storage - hardware that really doesn't care what kind of filesystem you're using. – Bob – 2017-01-24T23:02:59.907

1Also, using any kind of filesystem-level 'bad sector' count to check drive health is ... weird. That's what S.M.A.R.T. exists for. Which also incidentally reports both reallocated sector count and pending sector [reallocation] count (and if either of those are anything but 0, it's time to replace the drive). – Bob – 2017-01-24T23:14:04.087

Your answer is based on anecdotal evidence and doesn't answer the question. By the way, good luck restoring from that backup when your PSU fails and damages all connected hard disks, or when your house burns down. – gronostaj – 2017-01-24T23:23:54.727

1This answer, while it might contain helpful information (if any of it is more than conjecture) does not answer the clear requirements set forth in the question, which was specifically about the proper care and maintenance for maximum likely longevity of power-off hardware. – music2myear – 2017-01-25T00:12:31.673

My "anecdotal evidence" is based on 35 years as a IS manager of IBM RISC, Linux, and Win occupational experience. Unfortunately, Your reply is apparently lacking experience and knowledge of individual OS.

see http://www.howtogeek.com/134735/how-to-see-if-your-hard-drive-is-dying/

"Windows doesn’t have an easy-to-use built-in tool that shows your hard disk’s S.M.A.R.T. data."

– jwzumwalt – 2017-01-25T01:56:47.867

1@jwzumwalt I do not see the (non)existence of a built-in tool as particularly important -- the data is there, the hardware and firmware supports it, and it's accessible by software, e.g. CrystalDiskInfo. (And if you're trying to compare OSes, there's no built-in/preinstalled tool on many Linux dstros either, incl. Debian.) S.M.A.R.T. is very much the industry-standard way of detecting impending disk failure (though, to be fair, in a bigger enterprise environment they're just as likely to let it fail and replace after the fact, yay redundancy). – Bob – 2017-01-25T07:10:08.460

I also asked for a citation for the "chkdsk every 6 months" thing, because the last time I heard it, it was largely directed at Win9x with FAT (scandisk back then). It's much less useful with NTFS, and even less so in the post-XP era, since the OS basically does it automatically in the background since Vista.

– Bob – 2017-01-25T07:19:23.520

Anyway, some of the info in this answer is good/correct. Some of it is questionable, hence the comments asking for references. But the biggest issue, and probably the reason for the downvote (not from me), is that it doesn't address this particular question... which, again, is asking about data retention on a powered-off drive in storage. – Bob – 2017-01-25T07:23:11.690

1What "Linux system" or "Windows system"? What wear? I think you're talking about disks which are in use on a running system, rather than answering my question. – einpoklum – 2017-01-25T09:00:02.237

"[Also, using any kind of filesystem-level 'bad sector' count to check drive health is ... weird.]". It isn't weird; that is the industry standard for data-banks/farms. They use failed sectors accompanied by statistical data based on manufacture/model number to determine replacement of entire drive banks. – jwzumwalt – 2017-01-25T17:56:57.560

Actually, you're somewhat right there - I just remembered the exception of combined filesystem+volume manager (e.g. ZFS), where that does make some sense. But that also relies more on the FS's own reporting tools & maintenance routines (scrubbing) rather than formatting and counting bytes (?!). – Bob – 2017-01-26T03:10:34.153