6

Background:
We need ready access to 30TB of audio data, although only a small fraction of it is ever requested for playback, that playback needs to be done immediately even for multi-year old data. The data resides in a SAN of multiple arrays and a nightly backup is performed on new data. Some data is also removed every night as well. Since both are write events, call it 20GB a night. The overall trend is more new data is written than old data is removed.

Weekly Patrol Reads(PR) and Consistancy Checking(CC) account for most of the disk activity on the arrays, other than them just spinning until they fail.

Question:
I'm trying to figure out if the if the disk based SAN should be replaced with one using using NVMe, what RAID level to consider and if it makes sense to reduce the frequency of PR or CC activity for the VNAND technology?

It is my understanding, what kills the VNAND is writes, and we would be writing way less data than the daily minimum on most drives even considering the consistency checking.

I have been able to find almost no testing of RAID 5/6 on NVMe or even SSD in general. I'm after primarily long term availability.

Research:
Most of the other questions on this topic predate NVMe technology and are 6-7 years old. This one is an exception but doesn't really cover this scenario either.
Understanding NVMe storage and hardware requirements

Related:
Long term storage of business critical data
Long term archival of video & Audio files
One Year Raid 0 setup

Rowan Hawkins
  • 590
  • 2
  • 18
  • This was from a long time ago. I was looking for numbers that I could give to management with a cost versus life expectancy rate. I have since learned that it is probably too nuanced have an issue to be broken down that simply. While I am still interested in the issue, I have moved on to a company less technologically hide bound. Baruch has the best existing answer. – Rowan Hawkins Mar 26 '20 at 17:14
  • The other issue that that ssds would have is most volumes in the array were over 70% full so there is no space to manage page swapping as cells fail. I realize that Enterprise drives also have a large over-provision space to help address that. – Rowan Hawkins Mar 26 '20 at 17:17

3 Answers3

2

By using SSDs over HDDs you will get some power benefit and likely have a reliability benefit (enterprise grade SSDs are far more reliable than enterprise grade HDDs). There is no issue with the nand endurance especially not at the level of activities that you have and even at higher levels the endurance is not a real issue. You can most likely also go for the relatively cheaper read-optimized drives (with 0.3 DWPD) and have no worries with regard to the disk endurance.

The only question in such a use case is if the cost of the drives warrants the power and reliability advantages.

As for the reliability/availability, all enterprise grade SSDs I've seen advertise MTBF of 2 million hours and those I've worked with have exceeded that mark. The opposing side is that all enterprise grade HDDs claim 1.2M hours of MTBF and none got even halfway there so you will see a big reliability jump upwards with the move. Again, if it's really worth it for the cost or not is your calculation to make.

My qualification here is that I worked on enterprise storage systems involving HDDs and SSDs and worked on the hardware/software integration and was deeply involved in the reliability of the combined systems. The data sets I relied on are private so there is no open research that I can point to though.

Baruch Even
  • 1,043
  • 6
  • 18
-1

Electrical charges fading also kills NAND. Probably very slowly on a good solid state, but noticeable after time. Quite different from magnetic spindles which hold data for 10 years or more. If they spin up again, that is.

Look up reliability data as a function of bytes written, hours spinning, and other metrics. Vendor specs as well as any public data sets. Replace drives whenever they show wear. Especially near the end of their warranty, maybe 3 years old.

Use different media for your backups than online data. If the primary storage is solid state, use tape or magnetic spindles for the protection storage.

Reevaulate archive media at least every 10 years. Transfer old backups you care about to whatever the current protection media is.

Being a good archivist is not specific to the media type or redundancy scheme, storage evolves over time. There is not one answer here, even for similar performance, availability, and cost requirements.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32
  • 1
    This question is the primary storage. It is almost never turned off and then not for more than a couple of hours. I'm not sure how your answer answers the question. – Rowan Hawkins Apr 11 '18 at 15:54
  • This is a total off answer for anything that is in active use, which is what the OP is asking about. – TomTom Feb 18 '20 at 19:35
-2

Flash storage is still too new for there to be any good at-scale studies into long-term longevity to rely on. So far, the indications of SLC and MLC flash looks good, and seems to give you as good or better longevity as spinning rust. TLC and especially QLC flash is way too new to make any qualified predictions about, but they could reasonably be expected to provide worse longevity than SLC and MLC flash. Personally, I wouldn't move from spinners to flash for longevity reasons, but possibly for other reasons such as performance. Instead, I'd look into the integrity features of the storage management system, and make sure it can properly deal with partially lost or corrupted data. ZFS is possibly the leader in this respect.

  • SERIOUSLY? All SSD storage is done for years for high end. Samsung enterprise grade SSD are in genreatoin soemthing which is WAY higher than one. There is plenty of experience in the high end market - which your answer shows you have zero experience in. – TomTom Feb 18 '20 at 19:36
  • Yes, there are enterprise-grade SSDs that last well over the years (3 years uptime here), just like some desktop ones that are pretty good. But still, we should make statistical determinations over at least 5 years to have relevant results on the quality and durability of them. Sufficient to say that so far they proved to be resilient enough for an enterprise environment. – Overmind Feb 19 '20 at 07:42
  • @TomTom As the answer explicitly states, experience so far with enterprise grade SLC and MLC flash indicate good longevity, that is easily good enough for typical enterprise use, but if you're aware of any statistically significant studies into the long-term (decades) stability of flash storage, I'd be very interested in those sources, as I haven't yet been able to find any. – Simon Kepp Nielsen Feb 19 '20 at 22:14