What medium should be used for long term, high volume, data storage (archival)?

58

46

This question was inspired by https://superuser.com/questions/374386/how-to-store-and-preserve-lots-of-data. There have been other similar questions, but none with the same criteria.

This is two questions in one.

  1. How do you store financial/critical records that should survive anything but a fire and should be available for decades?
  2. Lets say I want to store family photos/videos and want people do be able to find them in storage 100 years from now and still be able to use them. How would this be done?

Criteria

  1. Long term means 30+ years guaranteed. 100+ years average. [If this is not practical, use the closest solution]
  2. High volume means a couple terabytes.
  3. Answers can be 'no-compromise/industrial' solutions or practical solutions for the home office/small business user.
  4. Media will not be active during the timespan. (i.e., if you suggest hard drives, they will not be spinning).
  5. Further, there is no expectation of needing to read these archives. They are there for emergency or "for future generations" purposes.
  6. Should not require maintenance (if at all possible).

My thoughts:

  1. CD-R/DVD-Rs have proven to me, even in the short term, to be a terrible medium for backups. They seem to be very fragile and seem to lose their data a very short time even when in pristine condition.
  2. I can't help but think that storing data on a couple of 1TB hdd's and then expecting them to spin up correctly a decade or two later to be a terrible idea. Am I wrong?
  3. Industrial tape drives seem like a viable option?

user606723

Posted 2012-01-04T17:36:43.033

Reputation: 1 217

4

If you want no compromise, there is existing technology that is designed to last at least 40,000 years with no intervention: http://voyager.jpl.nasa.gov/spacecraft/goldenrec.html

– fixer1234 – 2015-02-05T23:20:17.790

The future is in crystals, it can potentially store 360TB and last a million years. See: 5D 'Superman memory crystal' heralds unlimited lifetime data storage

– kenorb – 2015-05-21T11:03:09.670

Data Saved in Quartz Glass Might Last 300 Million Years – Martin Thoma – 2016-03-17T10:53:53.297

The Medium-Term Prospects for Long-Term Storage Systems – a CVn – 2016-12-28T18:51:32.293

I'm no expert, but I'd say tape. This question might be better on Server Fault, but I honestly don't think it fits perfectly on either, so I'll decline to vote. It is a good question and should live somewhere. – Shinrai – 2012-01-04T17:57:09.663

I agree @Shinrai. I am welcome to moving this somewhere else if someone can comment on where it should live. – user606723 – 2012-01-04T18:01:58.047

Answers

20

Paper

Other than archival ink on archival paper in sealed storage, no current medium is proven to last an average 100 years without any sort of maintenance.

Archival Paper

Older papers were made from materials such as linen and hemp, and so are naturally alkaline. or acid free, therefore lasting hundreds of years. 20th century paper and most modern paper is usually made from wood pulp, which is often acidic and does not keep for long periods.

Archival Inks

These permanent, non-fading inks are resistant to light, heat and water, and contain no impurities that can affect the permanence of paper or photographic materials. Black Actinic Inks are chemically stable and feature an inorganic pigment that has no tendency to absorb impurities like other ink pigments can.

Redundant storage

Torvalds once said

Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it

Which suggests you should not rely on a single copy on a single medium.

Not magnetic media?

http://www.zdnet.com/blog/perlow/the-bell-tolls-for-your-magnetic-media/9364?tag=content;siu-container

  • Typical example of irretrievable degradation of magnetic media.
  • Issues of hardware and software (and data formats)

Not specialized systems

In 2002, there were great fears that the discs would become unreadable as computers capable of reading the format had become rare and drives capable of accessing the discs even rarer. Aside from the difficulty of emulating the original code, a major issue was that the still images had been stored on the laserdisc as single-frame analogue video,

http://en.wikipedia.org/wiki/BBC_Domesday_Project#Preservation

Long Term Personal storage

http://www.zdnet.com/blog/storage/long-term-personal-data-storage/376

  • both the media AND the format can become unreadable.
  • print on acid-free paper with pigment inks and store in a cool, dry and dark place.
  • The first problem is picking data formats for maximum longevity.
  • Avoid using proprietary formats
  • USCSF is transferring all their original tapes - many in now-obsolete formats like BetaSP and VHS - to the 75Mbit motionJPEG2000 format

RedGrittyBrick

Posted 2012-01-04T17:36:43.033

Reputation: 70 632

1>

  • Can you provide details about this? Will normal hard copies not last that long? (Photos from 100 years ago seems to be fine, AFAIK). 2) If no current data medium will last this long, I suggest that we use the closet solution possible. It's depressing that decades from now we won't be able to look through old boxes and expect to be able to look at any of our old, forgotten photos, etc.
  • < – user606723 – 2012-01-04T18:05:32.503

    @user606723: see updated answer – RedGrittyBrick – 2012-01-04T19:11:18.257

    I've figured that laser printing on acid-free paper would be a good way to store data (a few megabytes per page) that has a high probability of being readable in 100-200 years. The software to read it would be relatively simple, and one presumes that scanners will always be available, so the format (so long as not too convoluted) would never really "go away" beyond the ability of a competent amateur to recover. – Daniel R Hicks – 2012-01-29T16:10:13.900

    64

    Short answer

    It's impossible to guarantee a long timeframe because of entropy (also called death!). Digital data decay and dies, just like any other thing in the universe. But it can be slowed down.

    There's currently no fail-proof and scientifically proven way to guarantee 30+ years of cold data archival. Some projects are aiming to do that, like the Rosetta Disks project of the Long Now museum, although they are still very costly and with a low data density (about 50 MB).

    In the meantime, you can use scientifically proven resilient optical mediums for cold storage like Blu-ray Discs HTL type like Panasonic's, or archival grade DVD+R like Verbatim Gold Archival, and keep them in air-tight boxes in a soft spot (avoid high temperature) and out of the light.

    Also be REDUNDANT: Make multiple copies of your data (at least 4), and compute hashes to check regularly that everything is alright, and every few years you should rewrite your data on new disks. Also, use a lot of error correcting codes, they will allow you to repair your corrupted data!

    Long answer

    Why are data corrupted with time? The answer lies in one word: entropy. This is one of the primary and unavoidable force of the universe, which makes systems become less and less ordered in time. Data corruption is exactly that: a disorder in bits order. So in other words, the Universe hates your data.

    Fighting entropy is exactly like fighting death: you're not likely to succeed, ever. But, you can find ways to slow death, just like you can slow entropy. You can also trick entropy by repairing the corruptions (in other words: you cannot stop the corruptions, but you can repair after they happen if you took measures beforehand!). Just like anything about life and death, there's no magic bullet, nor one solution for all, and the best solutions require you to directly engage in the digital curation of your data. And even if you do everything correctly, you're not guaranteed to keep your data safe, you only maximize your chances.

    Now for the good news: there are now quite efficient ways to keep your data, if you combine good quality storage mediums, and good archival/curation strategies: you should design for failure.

    What are good curation strategies? Let's get one thing straight: most of the info you will find will be about backups, not about archival. The issue is that most folks will transfer their knowledge on backups strategies to archival, and thus a lot of myths are now commonly heard. Indeed, storing data for a few years (backup) and storing data for the longest time possible spanning decades at least (archival) are totally different goals, and thus require different tools and strategies.

    Luckily, there are quite a lot of research and scientific results, so I advise to refer to those scientific papers rather than on forums or magazines. Here, I will summary some of my readings.

    Also, be wary of claims and non independent scientific studies, claiming that such or such storage medium is perfect. Remember the famous BBC Domesday project: «Digital Domesday Book lasts 15 years not 1000». Always double check the studies with really independent papers, and if there's none, always assume the storage medium is not good for archival.

    Let's clarify what you are looking for (from your question):

    • Long-term archival: you want to keep copies of your sensible, irreproducible "personal" data. Archiving is fundamentally different than a backup, as well explained here: backups are for dynamic technical data that regularly get updated and thus need to be refreshed into backups (ie, OS, work folders layout, etc.), whereas archives are static data that you would likely write only once and just read from time to time. Archives are for intemporal data, usually personal.

    • Cold storage: you want to avoid maintenance of your archived data as much as possible. This is a BIG constraint, as it means that the medium must use components and a writing methodology that stay stable for a very long time, without any manipulation from your part, and without requiring any connection to a computer or electrical supply.

    To ease our analysis, let's first study cold storage solutions, and then long-term archival strategies.

    Cold storage mediums

    We defined above what a good cold storage medium should be: it should retain data for a long time without any manipulation required (that's why it's called "cold": you can just store it in a closet and you do not need to plug it into a computer to maintain data).

    Paper may seem like the most resilient storage medium on earth, because we often find very old manuscript from ancient ages. However, paper suffers from major drawbacks: first, the data density is very low (cannot store more than ~100 KB on a paper, even with tiny characters and computer tools), and it degrades over time without any way to monitor it: paper, just like hard drives, suffer from silent corruption. But whereas you can monitor silent corruptions on digital data, you cannot on paper. For example, you cannot guarantee that a picture will retain the same colors over only a decade: the colors will degrade, and you have no way to find what were the original colors. Of course, you can curate your pictures if you are a pro at image restoration, but this is highly time consuming, whereas with digital data, you can automate this curation and restoration process.

    Hard Drives (HDDs) are known to have an average life span of 3 to 8 years: they do not just degrade over time, they are guaranteed to eventually die (ie: inaccessible). The following curves show this tendency for all HDDs to die at a staggering rate:

    Bathtub curve showing the evolution of HDD failure rate given the error type (also applicable to any engineered device):

    curve-hdd1

    Curve showing HDD failure rate, all error types merged: curve-hdd2

    Source: Backblaze

    You can see that there are 3 types of HDDs relatively to their failure: the rapidly dying ones (eg: manufacturing error, bad quality HDDs, head failure, etc.), the constant dying rate ones (good manufacturing, they die for various "normal" reasons, this is the case for most HDDs), and finally the robust ones that live a bit longer than most of HDDs and eventually die soon after the "normal ones" (eg: lucky HDDs, not-too-much used, ideal environmental conditions, etc..). Thus, you are guaranteed that your HDD will die.

    Why HDDs die so often? I mean, the data is written on a magnetic disk, and the magnetic field can last decades before fading away. The reason they die is because the storage medium (magnetic disk) and the reading hardware (electronic board+spinning head) are coupled: they cannot be dissociated, you can't just extract the magnetic disk and read it with another head, because first the electronic board (which convert the physical data into digital) is different for almost each HDD (even of the same brand and reference, it depends on the originating factory), and the internal mechanism with the spinning head is so intricate that nowadays it's impossible for a human to perfectly place a spinning head on magnetic disks without killing them.

    In addition, HDDs are known to demagnetize over time if not used (including SSD). Thus, you cannot just store data on a hard disk, store it in a closet and think that it will retain data without any electrical connection: you need to plug your HDD to an electrical source at least once per year or per couples of years. Thus, HDDs are clearly not a good fit for cold storage.

    Magnetic tapes: they are often described as the go-to for backups needs, and by extension for archival. The problem with magnetic tapes is that they are VERY sensitive: the magnetic oxide particles can be easily deteriorated by sun, water, air, scratches, demagnetized by time or any electromagnetic device or just fall off with time, or print-through. That's why they are usually used only in datacenters by professionals. Also, it has never been proven that they can retain data more than a decade. So, why are they often advised for backups? Because they used to be cheap: back in the days, it costed 10x to 100x cheaper to use magnetic tapes than HDDs, and HDDs tended to be a lot less stable than now. So magnetic tapes are primarily advised for backups because of cost effectiveness, not because of resiliency, which is what interests us the most when it comes to archiving data.

    CompactFlash and Secure Digital (SD) cards are known to be quite sturdy and robust, able to survive catastrophic conditions.

    The memory cards in most cameras are virtually indestructible, found Digital Camera Shopper magazine. Five memory card formats survived being boiled, trampled, washed and dunked in coffee or cola.

    However, as any other magnetic based medium, it relies on an electrical field to retain the data, and thus if the card runs out of juice, data may get totally lost. Thus, not a perfect fit for cold storage (as you need to occasionally rewrite the whole data on the card to refresh the electrical field), but it can be a good medium for backups and short or medium-term archival.

    Optical mediums: Optical mediums are a class of storage mediums relying on laser to read the data, like CD, DVD or Blu-ray (BD). This can be seen as an evolution of paper, but we write the data in a so tiny size, that we needed a more precise and resilient material than paper, and optical disks are just that. The two biggest advantages of optical mediums is that the storage medium is decoupled from the reading hardware (ie, if your DVD reader fails, you can always buy another one to read your disk) and that it's based on laser, which makes it universal and future proof (ie, as long as you know how to make a laser, you can always tweak it to read the bits of an optical disk by emulation, just like CAMILEON did for the Domesday BBC Project).

    Like any technology, new iterations not only offer bigger density (storage room), but also better error correction, and better resilient against environmental decay (not always, but generally true). The first debate about DVD reliability was between DVD-R and DVD+R, and even if DVD-R are still common nowadays, DVD+R are recognized to be more reliable and precise. There are now archival grade DVD discs, specifically made for cold storage, claiming that they can withstand a minimum of ~20 years without any maintenance:

    Verbatim Gold Archival DVD-R [...] has been rated as the most reliable DVD-R in a thorough long-term stress test by the well regarded German c't magazine (c't 16/2008, pages 116-123) [...] achieving a minimum durability of 18 years and an average durability of 32 to 127 years (at 25C, 50% humidity). No other disc came anywhere close to these values, the second best DVD-R had a minimum durability of only 5 years.

    From LinuxTech.net.

    Furthermore, some companies specialized in very long term DVD archival and extensively market them, like the M-Disc from Millenniata or the DataTresorDisc, claiming that they can retain data for over 1000 years, and verified by some (non-independent) studies (from 2009) among less-scientific others.

    This all seems very promising! Unluckily, there's not enough independent scientific studies to confirm these claims, and the few ones available are not so enthusiastic:

    Humidity (80% RH) and temperature (80°C) accelerated ageing on several DVDs over 2000 hours (about 83 days) of test with regular checking of readability of data: Humidity and temperature accelerated ageing on several DVDs brands

    Translated from the french institution for digital data archival (Archives de France), study from 2012.

    The first graph show DVD with a slow degradation evolution. The second one DVD with rapid degradation curves. And the third one is for special "very long-term" DVDs like M-Disc and DataTresorDisc. As we can see, their performance does not quite fit the claims, being lower or on par with standard, non archival grade DVDs!

    However, inorganic optical discs such as M-Disc and DataTresorDisc get one advantage: they are quite insensible to light degradation:

    Accelerated ageing using light (750 W/m²) during 240 hours: Light accelerated ageing on several DVDs brands

    These are great results, but an archival grade DVD such as the Verbatim Gold Archival also achieves the same performance, and furthermore, light is the most controllable parameter for an object: it's quite easy to put DVD in a closed box or closet, and thus removing any possible impact of light whatsoever. It would be much more useful to get a DVD that is very resilient to temperature and humidity than light.

    This same research team also studied the Blu-ray market to see if there would be any brand with a good medium for long term cold storage. Here's their finding:

    Humidity and temperature accelerated ageing on several Blu-ray brands, under the same parameters as for DVDs: temp-bd

    Light accelerated ageing on several BluRays brands, same parameters: light-bd

    Translated from this study of Archives de France, 2012.

    Two summaries of all findings (in french) here and here.

    In fine, the best Blu-ray disc (from Panasonic) performed similarly to the best archival grade DVD in humidity+temperature test, while being virtually insensible to light! And this Blu-ray disc isn't even archival grade. Furthermore, Blu-ray discs use an enhanced error correcting code than DVDs (themselves using an enhanced version relatively to CDs), which further minimizes the risks of losing data. Thus, it seems that some BluRay discs may be a very good choice for cold storage.

    And indeed, some companies are starting to work on archival grade, high density storage Blu-ray discs like Panasonic and Sony, announcing that they will be able to offer 300 GB to 1TB of storage with an average life span of 50 years. Also, big companies are turning themselves towards optical mediums for cold storage (because it consumes a lot less resources since you can cold store them without any electrical supply), such as Facebook which developed a robotic system to use Blu-ray discs as "cold storage" for data their system rarely access.

    Long Now archival initiative: There are other interesting leads such as the Rosetta Disc project by the Long Now museum, which is a project to write microscopically scaled pages of the Genesis in every languages on earth the Genesis got translated to. This is a great project, which is the first to offer a medium that allows to store 50 MB for really very long term cold storage (since it's written in carbon), and with future-proof access since you only need a magnifier to access the data (no weird format specifications nor technological hassle to handle such as the violet beam of the Blu-ray , just need a magnifier!). However, these are still manually made and thus estimated to cost about $20K, which is a bit too much for a personal archival scheme I guess.

    Internet-based solutions: Yet another medium to cold store your data is over the net. However, cloud backup solutions are not a good fit, for the primary concern than the cloud hosting companies may not live as long as you would like to keep your data. Other reasons include the fact that it is horribly slow to backup (since it transfers via internet) and most providers require that the files also exist on your system to keep them online. For example, both CrashPlan and Backblaze will permanently delete files that are not at least seen once on your computer in the last 30 days, so if you want to upload backup data that you store only on external hard drives, you will have to plug your USB HDD at least once per month and sync with your cloud to reset the countdown. However, some cloud services offer to keep your files indefinitely (as long as you pay of course) without a countdown, such as SpiderOak. So be very careful of the conditions and usage of the cloud based backup solution you choose.

    An alternative to cloud backup providers is to rent your own private server online, and if possible, choose one with automatic mirroring/backup of your data in case of hardware failure on their side (a few ones even guarantee you against data lost in their contracts, but of course it's more expensive). This is a great solution, first because you still own your data, and secondly because you won't have to manage the hardware's failures, this is the responsibility of your host. And if one day your host goes out of business, you can still get your data back (choose a serious host so that they don't shutdown over the night but notify you beforehand, maybe you can ask to put that onto the contract), and rehost elsewhere.

    If you don't want to hassle of setting up your own private online server, and if you can afford it, Amazon offers a new data archiving service, called Glacier. The purpose is exactly to cold store your data for the long-term: thus, it costs a lot to store data on a Glacier, but it costs even more to get back this data, as this service is made to store data out of reach, not to keep data that you want to often access. This means that this service quotes prices for writing data, but also for reading them. This service has a huge cost, but it may be a good deal for some of your most sensible data (ie: if you have a few text files or images that are VERY sensible, since this kind of data is usually of small size, it won't cost you very much to store in a Glacier).

    Shortcomings of cold storage: However, there is a big flaw in any cold storage medium: there's no integrity checking, because cold storage mediums CANNOT automatically check the integrity of the data (they can merely implement error correcting schemes to "heal" a bit of the damage after corruption happened, but it cannot be prevented nor automatically managed!) because, contrariwise to a computer, there's no processing unit to compute/journalize/check and correct the filesystem. Whereas with a computer and multiple storage units, you could automatically check the integrity of your archives and automatically mirror onto another unit if necessary if some corruption happened in an data archive (as long as you have multiple copies of the same archive).

    Long-Term Archival

    Even with the best currently available technologies, digital data can only be cold stored for a few decades (about 20 years). Thus, in the long run, you cannot just rely on cold storage: you need to setup a methodology for your data archiving process to ensure that your data can be retrieved in the future (even with technological changes), and that you minimize the risks of losing your data. In other words, you need to become the digital curator of your data, repairing corruptions when they happen and recreate new copies when needed.

    There's no foolproof rules, but here are a few established curating strategies, and in particular a magical tool that will make your job easier:

    • Redundancy/replication principle: Redundancy is the only tool that can revert the effects of entropy, which is a principle based on information theory. To keep data, you need to duplicate this data. Error codes are exactly an automatic application of the redundancy principle. However, you also need to ensure that your data is redundant: multiple copies of the same data on different discs, multiple copies on different mediums (so that if one medium fails because of intrinsic problems, there's little chances that the others on different mediums would also fail at the same time), etc. In particular, you should always have at least 3 copies of your data, also called 3-modular redundancy in engineering, so that if your copies become corrupted, you can cast a simple majority vote to repair your files from your 3 copies. Always remember the sailor's compass advice:

    It is useless to bring two compasses, because if one goes wrong, you can never know which one is correct, or if both are wrong. Always take one compass, or more than three.

    • Error correcting codes: this is the magical tool that will make your life easier and your data safer. Error correcting codes (ECCs) are a mathematical construct that will generate data that can be used to repair your data. This is more efficient, because ECCs can repair a lot more of your data using a lot less of the storage space than simple replication (ie, making multiple copies of your files), and they can even be used to check if your file has any corruption, and even locate where are those corruptions. In fact, this is exactly an application of the redundancy principle, but in a cleverer way than replication. This technique is extensively used in any long range communication nowadays, such as 4G, WiMax, and even NASA's space communications. Unluckily, although ECCs are omnipresent in telecommunications, they are not in file repair, maybe because it's a bit complex. However, some software are available, such as the well-known (but now old) PAR2, DVD Disaster (which offers to add error correction codes on optical disks) and pyFileFixity (which I develop in part to overcome PAR2 limitations and issues). There are also file systems that optionally implement Reed-Solomon such as ZFS for Linux or ReFS for Windows, which are technically a generalization of RAID5.

    • Check the integrity of your files regularly: Hash your files, and check them from time to time (ie, once per year, but it depends on the storage medium and environmental conditions). When you see that your files suffered of corruption, it's time to repair using the ECCs you generated if you have done so, and/or to make a new fresh copy of your data on a new storage medium. Checking data, repairing corruption and making new fresh copies is a very good curation cycle which will ensure that your data is safe. Checking in particular is very important because your files copies can get silently corrupted, and if you then copy the copies that have been tampered, you will end up with totally corrupted files. This is even more important with cold storage mediums, such as optical disks, which CANNOT automatically check the integrity of the data (they already implement ECCs to heal a bit, but they cannot check nor create new fresh copies automatically, that's your job!). To monitor files changes, you can use the rfigc.py script of pyFileFixity or other UNIX tools such as md5deep. You can also check the health status of some storage mediums like hard drives using tools such as Hard Drive Sentinel or the open source smartmontools.

    • Store your archives mediums on different locations (with at least one copy outside of your house!) to avoid for real life catastrophic events like flood or fire. For example, one optical disc at your work, or a cloud-based backup can be a good idea to meed this requirement (even if cloud providers can be shut down at any moment, as long as you have other copies, you will be safe, the cloud providers will only serve as an offsite archive in case of emergency).

    • Store in specific containers with controlled environmental parameters: for optical mediums, store away from light and in a water-tight box to avoid humidity. For hard drives and sd cards, store in anti-magnetic sleeves to avoid residual electricity to tamper the drive. You can also store in air-tight and water-tight bag/box and store in a freezer: slow temperatures will slow entropy, and you can extend quite a lot the life duration of any storage medium like that (just make sure that water won't enter inside, else your medium will die quickly).

    • Use good quality hardware and check them beforehand (eg: when you buy a SD card, test the whole card with software such as HDD Scan to check that everything is alright before writing your data). This is particularly important for optical drives, because their quality can drastically change the quality of your burnt discs, as demonstrated by the Archives de France study (a bad DVD burner will produce DVDs that will last a lot less).

    • Choose carefully your file formats: not all files formats are resilient against corruption, some are even clearly weak. For example, .jpg images can be totally broken and unreadable by tampering only one or two bytes. Same for 7zip archives. This is ridiculous, so be careful about the file format of the files you archive. As a rule of thumb, simple clear text is the best, but if you need to compress, use non-solid zip and for images, use JPEG2 (not open-source yet...). More info and reviews of pro digital curators here, here, and here.

    • Store alongside your data archives every software and specifications that are needed to read the data. Remember that specifications change rapidly, and thus in the future your data may not be readable anymore, even if you can access the file. Thus, you should prefer open source formats and software, and store the program's source code along your data so that you can always adapt the program from source code to launch from a new OS or computer.

    • Lots of other methods and approaches are available here, here and in various parts of the Internet.

    Conclusion

    I advise to use what you can have, but always respect the redundancy principle (make 4 copies!), and always check regularly the integrity (so you need to pre-generate a database of MD5/SHA1 hashes beforehand), and create fresh new copies in case of corruption. If you do that, you can technically keep your data for as long as you want whatever your storage medium is. The time between each check depends on the reliability of your storage mediums: if it's a floppy disk, check every 2 months, if it's a Blu-ray HTL, check every 2/3 years.

    Now in the optimal, I advise for cold storage to use Blu-ray HTL discs or archival grade DVD discs stored in water-tight opaque boxes and stored in a fresh place. In addition, you can use SD cards and cloud-based providers such as SpiderOak to store the redundant copies of your data, or even hard drives if it's more accessible to you.

    Use lots of error correcting codes, they will save your day. Also you can make multiple copies of these ECCs files (but multiple copies of your data is more important than multiple copies of ECCs because ECCs files can repair themselves!).

    These strategies can all be implemented using the set of tools I am developing (open source): pyFileFixity. This tool was in fact started by this discussion, after finding that there were no free tool to completely manage file fixity. Also, please refer to the project's readme and wiki for more info on file fixity and digital curation.

    On a final note, I really do hope that more R&D will be put on this problem. This is a major issue for our current society, having more and more data digitized, but without any guarantee that this mass of information will survive more than a few years. That's quite depressing, and I really do think that this issue should be put a lot more on the front, so that this becomes a marketing point for constructors and companies to make storage devices that can last for future generations.

    /EDIT: read below for a practical curation routine.

    gaborous

    Posted 2012-01-04T17:36:43.033

    Reputation: 1 412

    6Outstanding answer! This needs far more upvotes. – bwDraco – 2015-02-04T01:24:52.067

    1You plan to add MORE information? Consider publishing it as a textbook. :-) – fixer1234 – 2015-03-21T18:56:48.300

    1@fixer1234 yes I plan to add more information and, more importantly, more pertinent and reliable information. There are a lot of misconceptions and falsely perceived secure solutions in the field of file fixity, so there's quite a lot to say. I have found so much info after publishing this post that an update is clearly needed, and I already compiled everything in my notes along with references. I'm not sure SuperUser is the best place where to publish all this data but I have no blog of my own :-/ I will try to be as concise as possible. – gaborous – 2015-04-03T20:27:07.263

    Great answer, but part of your answer (at the very beginning) talks about storing data for a short term and you have identified this as "backup" and then the very long term storage as "archive". I was under the impression that backup means to store copies of data in the event of catastrophic data loss so as to be able to restore the original data. Storing data - even for a few weeks - and not having a duplicate could be considered "archiving". The lack of a duplicate means you haven't created a backup... – Kinnectus – 2015-10-20T09:35:10.053

    @BigChris You're right, except that in both cases, you need to use redundancy, and good curation (aka maintenance) strategies. Backing up and archiving are indeed very similar, and often use the same tools, which is why they are often used interchangeably, but what differs is the goal: backup should contain dynamic non personal data, while archives should contain static personal data. Here I specifically address the issue of long-term archival, not just archival, but indeed archival is just storing static data. But in both cases, you need redundancy to ensure your data will be recoverable. – gaborous – 2015-10-20T14:49:21.513

    I cannot say everything nor link to all the references I have because of the size limit in answers, but I'd like to add 3 things: be careful of HDD health indicators (SMART, temperature, activity level), they cannot be relied on, and also that with a good cutation scheme (check regularly, use redundancy), you can even reliably use floppy disks, which are notably known for being highly unreliable. And finally a good blog.

    – gaborous – 2015-10-20T16:25:16.317

    I started a chat about this answer if you would like to check in occasionally. I'm reading and correcting typos.

    – user193661 – 2015-11-06T02:28:29.997

    3DVD+Rs are quite reliable if you don't get fakes. CD-Rs were affected by any light from infrared to violet (and infrared is everywhere, sometimes a lot of it), DVD+Rs are affected only by red or shorter, already more difficult. DVDs also have the sensitive layer in between two layers of plastic, CDs had the layer just below the pencil-writable surface!! BD-R disks are the best: you need violet or ultraviolet light in order to ruin them, and their surface is the strongest one. I would say go with BD-R for practical archival with high probability of success after 30 years. But you need a player. – FarO – 2015-11-19T17:28:51.463

    1

    @OlafM yes that's true, each new generation of optical disks bring more reliable technologies with them, not only in their material, but also in their technological setup (eg, the way pits/grooves are written and managed, the error correcting code, etc.), but also you should pay attention to the material the layers were made in, not all optical disks are equal, and usually (but not always), archival grade disks are made with more resilient materials.

    – gaborous – 2015-12-21T12:12:49.223

    1Humidity may be controlled cheaply by putting in one of those moisture-absorber packets with the discs. So now we need data for low-humidity, high-temperature conditions. Also, temperature cycling over each 24h may be a factor, because of the repeated expansion/contraction that accompanies it. So now we need accelerated ageing data with temperature cycling... – Evgeni Sergeev – 2016-05-05T00:39:02.720

    New optical storage medium: "Superman memory crystal" disc by the team of Peter Kazansky (Optoelectronics Research Centre at University of Southampton) is said to be highly resilient to temperature changes (uo to 1,000 degrees celsius) and retain data for theoretically billion of years. We will have to wait for other labs to reproduce the results, but this can be very promising.

    – gaborous – 2016-07-03T18:21:57.810

    It looks like you've hit the 30,000-character answer length limit. If you need to add more information, you can split off some of the content into another answer. Include a link to the other answer in each post to make navigation easier for readers. – bwDraco – 2016-08-06T20:00:25.147

    @bwDraco Yes good idea, I could gather some more refs and put them in an extended answer. Also, I could describe the new scheme I use: 3 hard drives copies regularly checked + SpiderOak with infinite storage plan + BluRay discs for really really sensible data but not too big (I limit to 50 GB the data that I can store on these discs) + pyFileFixity and DVDisaster for folders I really want to ensure to keep in the long run. The most important thing for me was to prioritize the data: I assigned in four folders (garbage, personal, important, critical) and each has an additional degree of backup. – gaborous – 2016-08-06T21:58:35.200

    12

    Quick follow-up on my previous answer above, this will be made more concise and extended with additional (but not of primary importance) information and references that I cannot add in the first answer because of the 30K length constraints.

    Since long-term archival is a curation process, here are some other things you might want to pay attention to make your process more efficient and less time (and resources) consuming:

    • Deduplication: since the only way to ensure long-term archival is through deliberately designed redundancy, you want to avoid useless redundant data (eg, copies of files you fetched from your usb key to your archival hard drive, but you already have a copy coming from your main computer!). Unwanted redundant data, which are usually called duplicates are bad, both in storage cost (they take more storage resource but you will have a hard time finding them when needed), for your process (what if you have different versions of the same file? How can you know which copy is the correct one?) and for your time (it will adds up on the transfert times when you will synchronize the backup to all your archives). That's why professional archival services usually offer automated deduplication: files that are exactly similar will get the same inode, and they won't take any additional space. That's what SpiderOak does for example. There are automated tools you can use, and ZFS (Linux) or ReFS (Windows) filesystems can do it automatically for you.

    • Prioritization/categorization: as you can see, long term archival is a time consuming process that needs to be regularly conducted (to sanity check, synchronize archives across mediums, make new archives on new mediums to replace dying ones, repair files using error correcting codes, etc.). To minimize the time it costs you, try to define different schemes of protection depending on the priority of your data based on categories. The idea is that when you move your computer data to one of your external hard drive you use for long term archival, you place them directly in one folder defining the backup priority: "unimportant", "personal", "important", "critical". Then you can define different backup strategies for each folder: reserve the full protection (eg, backup on 3 hard drives + cloud + error correcting codes + BluRays) only for the most critical data you want to keep your whole life (the critical folder), then a medium protection for "important" data (eg, backup on 3 hard drives + cloud) and then "personal" are just copied to at least two external hard drives, and "unimportant" gets no copy (or maybe on one hard drive if the synchronization isn't too long...). Usually, you will see that "unimportant" will contain most data, then "personal" less, then "important" much less and "critical" will be quite tiny (less than 50 GB for me). For example, in "critical" you will put your house contract and your marriage and childbirths pictures. Then in "important" will be documents you don't want to lose like legal documents, some important photos and videos of memorable events, etc. In "personal" you'll put all your personal photos, videos from your holidays and work documents, these are documents and medias that you'd like to keep, but you won't die of regret if you lose them (and that's good because usually this folder is HUGE so you will probably lose some files in the long run...). "Unimportant" is all the stuff you download from internet or various files and medias you got that you don't really care about (like softwares and games and movies). The bottom line is that: the more files you want to long term archive, the harder (and time consuming) it will be, so try to keep the files that get this special treatment to a minimum.

    • Meta-data is a critical spot: even with good curation strategies, there is usually one thing that isn't protected: the meta-data. Meta-data includes the information about your files, for example: the directory tree (yep, this is only a few bytes, if you lose that, you get your files in total disorder!), the filename and extension, the timestamp (this may be important to you), etc. This might not seem a big deal, but imagine the following: what if tomorrow, all your files (including files shipped with softwares and stuff) are put all inside one flat folder, without their filename nor extension. Will you be able to recover the files you need from the billions of files on your computer, by manual inspection? Don't think this is an unusual scenario, it may happen as easily as if you get a power outtage or a crash in the middle of a copy: the partition being written can become totally destroyed (the infamous type RAW). To overcome this issue, you should be prepared and prepare your data for data recovery: to ensure that you keep the meta-data, you can agglomerate the files with their meta-data using non-solid archives such as ZIP DEFLATE or DAR (but not tar). Some filesystems offer automated meta-data redundancy, such as DVDisaster (for optical discs) and ZFS/ReFS (for hard drives). Then in case of a meta-data crash, you can try to recover your partitions using TestDisk or GetDataBack (allow partial directory tree recovery) or ISOBuster (for optical discs), to recover the directory tree and other meta-data. In case this all fails, you can fallback to filescraping using PhotoRec: this will extract all files it recognizes but in total disorder and without the filename nor timestamp, only the data itself will be recovered. If you zipped important files, you will be able to recover the meta-data inside the zip (even if the zip itself contains no meta-data any longer, at least inside the files will still possess the correct meta-data). However, you will have to manually check all the filescraped files one by one manually, which is time consuming. To safeguard against this possibility, you can generate beforehand an integrity checksum file using pyFileFixity or PAR2, and then use this integrity checksum file after filescraping to automatically recognize and rename the files depending on their content (this is the only way to automate filescraping meta-data recovery, because filescraping can technically only recover content, not the meta-data).

    • Test your file formats and curation strategies for yourself: instead of trusting the words of articles about which format type is better than the other one, you can try by yourself with pyFileFixity filetamper.py or just by yourself by replacing a few hexadecimal characters in some files: you will see that most file formats can break down with as few as a 3 different bytes. So you really ought to choose carefully your file formats: prefer simple text files for notes, and use resilient file formats for medias (they are still being worked on such as MPEG-4 Variable Error Correcting Code, ffmpeg implements it, ref will be added), or generate your own error correcting codes.

    • Read statistical studies, don't believe claims: As I said in the previous answer, extravagant claims are made all the time about the longevity of storage mediums without any scientific fact, and you should be particularly wary about that. Indeed, there is nothing in the law that prevent the manufacturer from boasting about fake, and unverifiable, claims on longevity. Prefer to refer to statistical studies, such as BackBlaze's annual report on hard drives failures rates.

    • Take long guaranteed storage medium. A guarantee cannot bring your data back, but it tells you about how the producer evaluates the failure rate of its product (because else it would cost too much if the rate is too high during the guarantee period).


    An update on the scheme I use: I apply the prioritization strategy described above, and I added the cloud backup service SpiderOak to my scheme, because it has a plan with infinite storage and it's totally encrypted, so I retain sole ownership of my data. I do NOT use as my sole backup medium for any of my data, it's only an additional layer.

    So here's my current scheme:

    • 3 hard drives copies regularly checked and synchronized and stored in two different places and 1 that is always on me (I use it to store garbage and to do quick backups).
    • SpiderOak with infinite storage plan
    • BluRay discs for really really sensible data but not too big (I limit to 50 GB the data that I can store on these discs)
    • pyFileFixity and DVDisaster for folders I really want to ensure to keep in the long run.

    My daily routine is like this: I always have one 2.5 portable USB HDD that I can use to stash unimportant stuff (moving files out of my computer to the HDD) or to backup important stuff (copy files to HDD but keep a copy on my computer). For really critical stuff, I additionally activate the online backup to SpiderOak (I have a folder on my computer with critical stuff, so I just need to move critical files there and it gets synchronized by SpiderOak automatically). For REALLY critical files, I also compute an error correction file using pyFileFixity.

    So to summary, for critical stuff, I store them on: the portable HDD, SpiderOak cloud and my computer, so I have 3 copies at any time with just two quick actions (copy to portable HDD and move to SpiderOak folder). If one copy gets corrupted, I can do a majority vote to fix them using pyFileFixity. It's a very low cost scheme (both in price and time) but very efficient and implements all the core tenets of digital curation (triple redundancy, different copies in different locations, different mediums, integrity check and ecc by SpiderOak).

    Then, every 3 to 6 months, I synchronize my portable HDD to my second HDD at home, and then every 6 to 12 months I synchronize my portable HDD to my third HDD which is at another house. This provide the additional benefit of rotation (if in 6 months I realize something went wrong in my last backup and I deleted critical files, I can get them from one of the two home HDDs).

    Finally, I wrote some very critical files on BluRay discs using DVDisaster (and additional ecc files with pyFileFixity but I'm not sure that was necessary). I store them in an air-tight box in a closet. I only check them every few years.

    So you see, my scheme is not really a big burden: on a daily basis, it takes a few minutes to copy files to portable HDD and to my SpiderOak folder, and then I just synchronize every 6 months to one or the other home HDD. This can take up to a day depending on how much data needs to be synchronized, but it's automated by softwares, so you just have to let a computer run the software and you do something else (I use a 100$ netbook I bought just to do that, so I can work on my main computer at the same time without worrying about crashing my computer in the middle of a copy which can be dreadful and destroy your hard drive that is being written). The error correction codes and the BluRay schemes are only used rarely for really critical data, so it's a bit more time consuming, but it's rare.

    This scheme can be enhanced (as always), for example by using ZFS/ReFS on the hard drives: this would implement an automated Reed-Solomon error correction code protection and integrity check (and dittoblocks!) without any manual interaction on my part (contrary to pyFileFixity). Although ZFS cannot run under Windows OSes (for the moment), there is ReFS which allows similar error correction control at the filesystem level. Also, it could be a good idea to use these filesystems on external HDDs! A portable HDD running ZFS/ReFS with automated RS error correction and deduplication should be awesome! (and ZFS seems to be quite fast so copy should be quick!).

    One last note: be careful of claims about ECC capabilities of filesystems such as in this list, because for most it is limited to only the metadata (such as APFS) or to RAID 1 mirroring (btrfs). To my knowledge, only ZFS and ReFS provide real error correction codes (and not simple mirroring) of both metadata and data, with ZFS being the most advanced currently (although still somewhat experimental as of 2018), in particular because ReFS drives cannot be bootable.

    /UPDATE 2020: There are new solutions that are emerging, they are still in early experimental phase, are using a decentralized approach often based on immutable blockchains, and are very interesting to explore although probably not usable right now for most of them (I would not rely on them to backup critical data, but they could be used as secondary backup if you feel adventurous):

    • Perkeep (comparison with other softwares). A similar project is Upspin. Both are actively developed as of early 2020.
    • Sia
    • Syncthing can facilitate backups mirroring between multiple devices, it's free and opensource
    • libchop for developers
    • bitdust (rebuilding not yet ready so be careful!)

    gaborous

    Posted 2012-01-04T17:36:43.033

    Reputation: 1 412

    11

    There is no easy solution. The archive maintenance is a process, not one-time job. All three currently-available archival media types have their own pluses and minuses, however these arguments apply to all media types:

    1. Nobody stored DVDs or hard disks for 30 or 100 years, for obvious reasons. So there is no track record and nobody knows how the media will age. Artificial aging tests do not prove much, and you rely on vendor's testing, (not impartial).

    2. You must store the media in the controlled environment for best results (constant temperature/humidity, low light, etc.). Otherwise media life is shortened significantly.

    3. You must maintain the hardware and software that reads the media (e.g. SATA interfaces might not be readily available in 30 years from now).

    So, in my opinion, the only viable solution for home users or small businesses is this:

    1. Maintain multiple copies of all data on diverse media types (both hard disks and DVDs)
    2. Maintain multiple copies of all data in multiple locations (at home and in your banks's safety deposit box).
    3. Copy all data to new media every so often (e.g. copy to a new hard disk and new DVD disks every 2 years. As the data density grows, you will probably need fewer disks, too.
    4. Maintain paper copies for all critical data, if possible (e.g. print those yearly general ledgers for your business, print most precious family photos, etc.)

    haimg

    Posted 2012-01-04T17:36:43.033

    Reputation: 19 503

    1

    @user606723 What you call "RAID for DVDs" actually exists and is already implemented in the form of "error correcting codes", in particular using CIRC (Cross-Interleaved Reed–Solomon Coding). That's why tiny scratches or dust won't prevent you from reading the data, because it's already automatically corrected. However, you cannot specify the level of redundancy you want, so if you want a more resilient DVD, you must use a third-party software such as DVDisaster, PAR2 or pyFileFixity.

    – gaborous – 2015-10-25T14:17:41.600

    1I wonder if there is RAID for DVDs.... ie.. if you store DVDs for two years, you might be seansibly sure that 80% of them would be error free, so you might have two parity disks. Hmmmm. usenet uses parity files I think. Might be worth using something like that for DVD/CD/BD archival. – user606723 – 2012-01-04T19:00:21.897

    1@user606723: This is a very good idea! I suggest using something like multi-volume RAR archive (if the original files are really big) with PAR2 parity files... – haimg – 2012-01-04T19:29:26.890

    4Interface compatibility would be a major concern; it's been about 30 years since the IBM XT was introduced, yet how many computers today can in any way interface with a pre-ATA hard disk? How many computers built today can even interface with a PATA hard disk without additional hardware (controller card or USB adapter)? – a CVn – 2012-01-30T09:48:35.913

    5

    I'd go microfilm. I don't know if it is still manufactured, but I'd be surprised if it wasn't. Silver-based negatives last hundreds of years if stored correctly. Of course that is a huge investment, and will take up a whole room for photography and viewing, and that is not counting storage. So that's only if you really MEAN 100 years+ with no maintenance.

    If not - and chances are you aren't unless you want to make a time capsule -, just use HDD backups, and copy the whole stuff over to new media in every 10-15 years. Really, there is no better insurance against the aging of the medium than copying the whole thing over every 10 years or so. Better than microfilm, better than clay tablets, better than stone obelisks buried in the desert sand.

    Sigmoid

    Posted 2012-01-04T17:36:43.033

    Reputation: 51

    4

    Up to 5TB (or more?) you can securely store up to 30 years on a magnetic tape aka tape drive. This time is proven. Blue-ray recordables shall safely store your stuff up to 30 years also, but it's capacity is around 100GB.

    If you have more money, you'd store it on black/white 35mm film. It's assumed that data can be restored (depending on density) for the next 700 years. (German link to wikipedia)

    tuergeist

    Posted 2012-01-04T17:36:43.033

    Reputation: 300

    For the record, writing to 20-50 blu-ray disks is not out of the question. – user606723 – 2012-01-04T18:14:54.453

    I've never heard of data archival on 35mm, although the principle is obvious I suppose. What's the density like? – Shinrai – 2012-01-04T18:41:07.253

    @Shinrai: I dunno the density of film, sorry – tuergeist – 2012-01-05T12:18:05.407

    You can probably figure a density somewhere between 1 and 10 megabits per frame. – Daniel R Hicks – 2012-01-29T16:05:17.120

    3Nikon's LS-9000 ED scans film at 4000 dpi, giving you 21.4 Mp/frame at 35 mm (24 x 36 mm). If you can use 1/10th of that for actual data storage (allowing for film imperfections, focusing and resolution limitations in the optics at both ends, etc) that's 2 Mb/frame or something like 10 MB for a 36-exposure roll of film and pure black/white. If the scanner's 4000 dpi is the limiting factor, that's 100 MB for a 36-exp roll. Of course, you'd still have to in some other way preserve information on how to read the data, because to the naked eye the frames would likely appear fairly uniformly gray. – a CVn – 2012-01-30T09:57:59.383

    @MichaelKjörling - could QR codes work at such high resolutions? Or multiple codes per frame? Seeing how widely used they are now, and thinking about how long normal barcodes are in use, they might be around in 50 years. – SPRBRN – 2014-05-19T13:55:32.080

    2

    I recommend a three inch diameter nickel disk with information microscopically etched onto its surface.

    http://rosettaproject.org/blog/02008/aug/20/very-long-term-backup/

    Dane

    Posted 2012-01-04T17:36:43.033

    Reputation: 1 682

    The only problem with this approach is that it can only store still images (scans). But it's currently the best approach for VERY long term storage (up to 2000 years, millenias yey!). Another shortcoming highlighted by some comments on the blog is that it can only store about 50 MB of data. – gaborous – 2015-02-02T00:37:18.357

    3Does it have to be exactly three inches? I have a 75 mm diameter nickel disk handy... – a CVn – 2013-12-26T21:44:27.163

    1

    For that kind of time spans, anything that already is on paper (or can be easily printed without losing information) would be best to store in that form. Just be mindful of the paper and toner you use for the hardcopy.

    As for others, I don't know of a currently used digital medium that would last for those spans of time. If you spend time (and thus money) to refresh your collection, then a magnetic tape might be a viable option - but even then you'd need some redundancy, as you might just find out that a single tape has gone bad (or it might be that the tape drive just happens to mangle the tape on reading it).

    And even when you can get the actual media to stand the test of time, you'd still be faced with the issue whether any program could read the media at 30 years from now, let alone 100 years from now.

    Juha Laiho

    Posted 2012-01-04T17:36:43.033

    Reputation: 131

    1Magnetic tape is subject to a number of failure modes, from "print through" to demagnetization over time to the oxide simply falling off the tape. – Daniel R Hicks – 2012-01-29T16:12:20.633

    1

    It's true that common CD-Rs and DVD-Rs are not reliable enough for archiving important data. But you can get DVDs that are not so quick to decay:

    https://www.google.com/search?q=archival+dvd-r

    Isaac Rabinovitch

    Posted 2012-01-04T17:36:43.033

    Reputation: 2 645

    Thank you for pointing this option out, a good alternative to M-Discs that is accessible to just about anyone with a DVD recorder. – gaborous – 2015-02-02T00:32:22.337

    "Verbatim Gold Archival DVD-R [...] has been rated as the most reliable DVD-R in a thorough long-term stress test by the well regarded German c't magazine (c't 16/2008, pages 116-123) [...] achieving a minimum durability of 18 years and an average durability of 32 to 127 years (at 25C, 50% humidity). No other disc came anywhere close to these values, the second best DVD-R had a minimum durability of only 5 years.", http://www.linuxtech.net/tips+tricks/best_safe_long-term_data_storage.html

    – gaborous – 2015-02-02T00:35:25.683

    1

    I've read that 'M-Disc' have created a DVD which needs a special writer yet is readable on generic DVD readers. They claim an estimatible life span of 1000 years, stating it cannot be accurately tested. Long exposure to the sun, scratches, multiple usage etc and the disc is 100% useable. I'd be interested in any feedback from anyone who's encountered this system.

    Here's an excerpt from Dell who maybe installing the M-Disc drive in their new laptops/PC's

    M-DISC Ready drives laser-etch data into an inorganic rock-like material to prevent data loss, ensuring your files are safe and can be stored for up to 1000 years, the company claims.

    Unlike all other recordable DVDs that use organic dyes to hold data, M Discs won’t fade or degrade over time.

    Dean

    Posted 2012-01-04T17:36:43.033

    Reputation: 11

    Instead of reposting with more info, you should have edited your original post. – Kazark – 2013-04-16T19:25:53.927

    Can you cite the quote with a link or something? Also, you can use > to format it as a block quote. – Kazark – 2013-04-16T19:28:02.653

    1

    You need to mix different technologies, locations and mediums in order to achieve long life backups:

    • Burn to DVD - Bluray at low speed. Keep them in low light, low temp, low humidity, free of scratch.
    • Keep a copy in a RAID 1, Raid5, Raid6 or Raid10 unit.
    • Keep another copy in an external HDD
    • Keep a copy in the cloud (carbonite, crashplan)
    • Keep a copy on M-Disc technology (Mdiscs and Mdisc burners) are not available at Amazon.com at very good prices. Manufacturer states they can hold data for 1000 years.

    Alex

    Posted 2012-01-04T17:36:43.033

    Reputation: 11

    @MichaelKjörling Store an additional computer with all the peripherals needed. Use ROM memory if needed. – QuyNguyen2013 – 2014-12-24T03:16:05.950

    I see three of your five bullet points are really variations of a single theme: magnetic hard drive storage. As for your last point, the issue isn't so much how long the media will retain the data (and at least hard disk manufacturers commonly cite numbers that are far better than reality) but for how long equipment to read the data will be available or knowledge of how to make them will be available. All of your suggested techniques are high-tech. Suppose the Vikings stored data on blu-ray disks; what are the odds we'd have the knowledge how to interpret that data now? – a CVn – 2013-12-26T21:43:09.110

    1

    As someone already mentioned there is a new tech called M-Disc. They are very reliable: http://www.zdnet.com/torture-testing-the-1000-year-dvd-7000023203/ We started to use them for securing images of production machines disks. There are already Blu-Rays on the market. Only disadvantage is they are slower than classic B-RDs.

    Tomasz Szkudlarek

    Posted 2012-01-04T17:36:43.033

    Reputation: 21

    I have a similar need to OP and after reading about it, I think I'll give it a try for this solution, thank's for pointing out this technology! It only requires to buy a DVD or Blu-ray writer compatible with M-Disc, and LG already produced a lot, so it's also quite accessible and low-cost! – gaborous – 2015-02-02T00:28:55.150

    1

    In fact it seems M-Discs are not as reliable as they pretend. An independent french study by the Archives de France (official France's data archival institution) to find the best data archival support, and they found that M-Discs do not really stand against humidity and temperature (accelerated ageing). I'll post here an answer with more details.

    – gaborous – 2015-02-03T17:51:10.107

    0

    Your answer is simple:

    https://wiki.openstack.org/wiki/Cinder

    Openstack is a system of a nearly 'immortal' storage, as you can upgrade or replace fault nodes with new ones even with future technologies unknown to us now. Your Data lives at least 2, up to 5 places simultaneously in this system, so complete storage notes can fail and you data is still present. Scales up to 50 PB (verified) - 110 PB. Basically it adds a SW Layer on your hardware and this makes your storage infinite alive. It overcomes our current sound barrier of Raid Sets with it's limitations of rebuild times of very large raid sets. Costs are about 50% of traditional Raid Storage systems. I know a system from FUJITSU featuring this as a reference architecture: CD10000

    Thomas Holzknecht

    Posted 2012-01-04T17:36:43.033

    Reputation: 19

    1Now you just have to put your faith in that company :-) – einpoklum – 2016-08-31T21:59:22.493

    0

    If you want to have a method to solve this problem, you should study the Digital Presevation field.

    http://en.wikipedia.org/wiki/Digital_preservation

    Digital preservation is the method of keeping digital material alive so that they remain usable as technological advances render original hardware and software specification obsolete (wikipedia)

    There is also a reference model: OAIS http://en.wikipedia.org/wiki/Open_Archival_Information_System

    There are a few open source and commercial solutions to accomply it. Libraries and Archives use this technologies to preserve digitized books for long periods of time.

    AGM

    Posted 2012-01-04T17:36:43.033

    Reputation: 11

    Keeping data for a long period of time does not equal the media itself surviving for that long, as has been pointed out in several highly voted answers already. One major part of digital preservation is data migration as media ages and becomes obsolete. – a CVn – 2015-05-21T08:54:55.840

    Thank you Michael. Only pointing to OAIS as a method to achieve the real objetive. – AGM – 2015-05-22T11:20:38.510

    This is a good answer for the digital curation strategies, but indeed not for what storage medium should be used. The OAIS model is very good and indeed used by most national libraries and archives in the world, but I find it a bit too much complicated, theoretical and containing unnecessary meta-data for individual usage. The BagIt model is a bit more practical and more usable, but still quite complicated, where simple tools like PAR2 or pyFileFixity might be enough.

    – gaborous – 2015-10-25T14:56:21.603

    -1

    Practical long term data storage using the current technology of the year 2014:

    ...and this is what I am doing.

    Get two of the multi-terabyte drives, for example two drives 3 terabytes each. Call one TB-1 and the other TB-2. Back up everything to TB-1. After a year of backing up to TB-1, re-format TB-2 and copy TB-1 to TB-2. Then for the next year, back up everything to TB-2. After that year, re-format TB-1 and copy TB-2 to TB-1 thereby starting the bi-annual cycle again.

    The reformatting restores the magnetic strength of the sector markers. And the copying restores the magnetic strength of the data.

    The same principle can be applied to tape backup and CD backup, or most any other backup. But CDs are so inconvenient because they can go bad in less than a year, and you need so many of them to back up everything. So, burning copies of all backup CDs every 5 months is just too much work. So far, I can store my whole life on one multi-terabyte drive.

    Indinfer

    Posted 2012-01-04T17:36:43.033

    Reputation: 101

    There's no need to rewrite data on HDD, you only need to provide an electric supply to maintain (or restore) the electromagnetic field. Rewriting data for long term storage is only necessary for SD/Compact cards and SDDs. – gaborous – 2015-10-25T16:26:14.880

    2CD's go bad in less than a year? Are you saying you don't own any CD more than 1 year old? I have data and audio CD's from more than a year I can assure you, and they work fine! – Dave – 2014-05-19T14:04:48.410

    1I have CDs from 1998 which still work fine. Regardless of us knowing this isn't true, what makes you believe this is the case? Can you source your information? Thanks. – Matthew Williams – 2014-05-19T14:29:07.100