98

This may sound like an odd question, but it's generated some spirited discussion with some of my colleagues. Consider a moderately sized RAID array consisting of something like eight or twelve disks. When buying the initial batch of disks, or buying replacements to enlarge the array or refresh the hardware, there are two broad approaches one could take:

  1. Buy all the drives in one order from one vendor, and receive one large box containing all the disks.
  2. Order one disk apiece from a variety of vendors, and/or spread out (over a period of days or weeks) several orders of one disk apiece.

There's some middle ground, obviously, but these are the main opposing mindsets. I've been genuinely curious which approach is more sensible in terms of reducing the risk of catastrophic failure of the array. (Let's define that as "25% of the disks fail within a time window equal to how long it takes to resilver the array once.") The logic being, if all the disks came from the same place, they might all have the same underlying defects waiting to strike. The same timebomb with the same initial countdown on the clock, if you will.

I've collected a couple of the more common pros and cons for each approach, but some of them feel like conjecture and gut instinct instead of hard evidence-based data.

Buy all at once, pros

  • Less time spent in research/ordering phase.
  • Minimizes shipping cost if the vendor charges for it.
  • Disks are pretty much guaranteed to have the same firmware version and the same "quirks" in their operational characteristics (temperature, vibration, etc.)
  • Price increases/stock shortages unlikely stall the project midway.
  • Each next disk is on-hand the moment it's required to be installed.
  • Serial numbers are all known upfront, disks can be installed in the enclosure in order of increasing serial number. Seems overly fussy, but some folks seem to value that. (I guess their management interface sorts the disks by serial number instead of hardware port order...?)

Buy all at once, cons

  • All disks (probably) came from the same factory, made at the same time, of the same materials. They were stored in the same environment, and subject to the same potential abuses during transit. Any defect or damage present in one is likely present in all.
  • If the drives are being replaced one-at-a-time into an existing array and each new disk needs to be resilvered individually, it could be potentially weeks before the last disk from the order is installed and discovered to be faulty. The return/replacement window with the vendor may expire during this time.
  • Can't take advantage of near-future price decreases that may occur during the project.

Buy individually, pros

  • If one disk fails, it shares very little manufacturing/transit history with any of the other disks. If the failure was caused by something in manufacturing or transit, the root cause likely did not occur in any other disk.
  • If a disk is dead on arrival or fails during the first hours of use, that will be detected shortly after the shipment arrives and the return process may go more smoothly.

Buy individually, cons

  • Takes a significant amount of time to find enough vendors with agreeable prices. Order tracking, delivery failure, damaged item returns, and other issues can be time-consuming to resolve.
  • Potentially higher shipping costs.
  • A very real possibility exists that a new disk will be required but none will be on-hand, stalling the project.
  • Imagined benefit. Regardless of the vendor or date purchased, all the disks came from the same place and are really the same. Manufacturing defects would have been detected by quality control and substandard disks would not have been sold. Shipping damage would have to be so egregious (and plainly visible to the naked eye) that damaged drives would be obvious upon unpacking.

If we're going simply by bullet point count, "buy in bulk" wins pretty clearly. But some of the pros are weak, and some of the cons are strong. Many of the bullet points simply state the logical inverse of some of the others. Some of these things may be absurd superstition. But if superstition does a better job at maintaining array integrity, I guess I'd be willing to go along with it.

Which group is most sensible here?

UPDATE: I have data relevant to this discussion. The last array I personally built (about four years ago) had eight disks. I ordered from one single vendor, but split the purchase into two orders of four disks each, about one month apart. One disk of the array failed within the first hours of running. It was from the first batch, and the return window for that order had closed in the time it took to spin everything up.

Four years later, the seven original disks plus one replacement are still running error-free. (knock on wood.)

smitelli
  • 1,214
  • 1
  • 10
  • 16
  • 7
    +1 from me for the question, because I've wanted to know it for some time myself. I have *definitely* seen the phenomenon of big file servers' HDDs all coming to the end of the bathtub curve around the same time, but often the number of approved vendors for such servers is pretty small, so the "buy lotsa places" approach is pretty hard. I'm looking forward to seeing answers with **real data** in them. – MadHatter Aug 23 '17 at 16:08
  • @MadHatter: I am with you with this one, but I am not aware of any hard data regarding that issue and until we get this, this is all speculation unfortunately. Personally, all cases I know where a bunch of similar disk started dying together was when they were used too long and started dying of old age. – Sven Aug 23 '17 at 16:14
  • 2
    Re. your update: This is a single data point. Repeat this for thousands of disks to get any useful metric. This is hard to do, especially with the shortish product cycles of disks, which results in a lack of this kind of data. – Sven Aug 23 '17 at 16:15
  • @Sven I'm surprised that there's not even an established "best practices" document somewhere that can tip the scales one way or the other. Maybe the entire premise is insignificant and entirely moot, but it doesn't _feel_ that way, you know? – smitelli Aug 23 '17 at 16:21
  • 1
    I seem to recall agreeing in meta some time back that *best practice* questions were on-topic, provided they didn't just generate a bunch of anecdata. I hope this question could have some great answers, and I think we should give it a chance. – MadHatter Aug 23 '17 at 16:24
  • @MadHatter: OK, let's see where this ends. I would love to get more then anecdotical data about this. – Sven Aug 23 '17 at 16:28
  • 3
    @Sven thanks, you're a gent; here's hoping. And to any potential answerers: **data, not anecdotes, please**. – MadHatter Aug 23 '17 at 16:44
  • There is not much variance in drive manufacturing. Don't make this difficult. – paparazzo Aug 23 '17 at 16:45
  • I would **love** if people would explain downvotes. – gxx Aug 23 '17 at 20:33
  • @gf_ Why does it matter? – ewwhite Aug 23 '17 at 22:45
  • @ewwhite It might lead to better quality: giving people a chance to improve the question, for example. – gxx Aug 23 '17 at 22:46
  • 2
    I manage a lot of machines with raids. **All disks fail eventually** so just have enough spares on hand that you can swap them at earliest notification, likely prefail rather than waiting for a full fail. – Criggie Aug 24 '17 at 11:13
  • 1
    Relevant: [Should I 'run in' one disk of a new RAID 1 pair to decrease the chance of a similar failure time?](https://serverfault.com/q/676121/58408) and [How should I burn in hard drives?](https://serverfault.com/q/309113/58408) – user Aug 24 '17 at 17:20
  • 1
    Re: "it could be potentially weeks before the last disk from the order is installed and discovered to be faulty." I think it's a good idea to try to burn in disks as soon as they're received. At least a write-read test of random data. I had some otherwise unused 32-bit servers set aside for this. – Mark Plotnick Aug 24 '17 at 20:27
  • 1
    Why consider using drives from multiple vendors but not drives of multiple brands from the same vendor? – ShadSterling Aug 25 '17 at 01:21
  • I was involved with getting PCs networked in a large organization 20+ years ago. We would order NICs in bulk, several dozen at a time (always from the same manufacturer); they were generally very reliable but we had one or two boxes where 50-60% of the cards failed within a month or two. I don't have much experience with purchasing drives but I'm interested in seeing where this goes. – David Aug 25 '17 at 17:42
  • Some data you might find useful: https://www.backblaze.com/b2/hard-drive-test-data.html – tonysdg Aug 25 '17 at 18:33
  • There are some - more or less hypothetical - pros and cons to this. But a major reason for rebuilding to fail is that there may be yet unnoticed read defects on your drives. These are normal to happen when the data has been written a few months or even years back and the disks are in use. Essentially, YOU NEED TO MAKE SURE THAT YOU'RE PATROLLING / SCRUBBING ALL DISK REGULARLY. Sorry for yelling but this is _the_ essential thing. If your controller doesn't support running a media patrol once a month, do use a software method. Weak sectors will be spotted and repairs done automatically. – Zac67 Aug 26 '17 at 14:48

10 Answers10

56

In practice, people who buy from enterprise vendors (HPE, Dell, etc.) do not worry about this.

Drives sourced by these vendors are already spread across multiple manufacturers under the same part number.

An HP disk under a particular SKU may be HGST or Seagate or Western Digital.

Same HP part number, variation on manufacturer, lot number and firmware enter image description here

You shouldn't try to outsmart/outwit the probability of batch failure, though. You're welcome to try if it gives peace of mind, but it may not be worth the effort.

Good practices like clustering, replication and solid backups are the real protection for batch failures. Add hot and cold spares. Monitor your systems closely. Take advantage of smart filesystems like ZFS :)

And remember, hard drive failures aren't always mechanical...

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 13
    The storage/shipping aspect is still in play, however. If somebody in an HP or FedEx stockroom drops a box full of disks, it may affect the entire received batch. – smitelli Aug 23 '17 at 16:36
  • 6
    @smitelli Okay. Backups, RAID, replication, DR, spares. The likelihood of all of your drives failing at once is small enough that this is not an issue that most should prepare to encounter. – ewwhite Aug 23 '17 at 17:00
  • 1
    A side note, an exemple, on HP when you customize your server and receive it, at 99.9% of the time HDD serial number are following. If wearing/usage is the same, like in a raid 1, I never thrust the healthly disk after a failure from another disk, got too much bad experience. – yagmoth555 Aug 24 '17 at 02:56
  • 3
    Something to be aware of, I bought 5 consumer-grade archive drives for a SW RAID box in a single order from amazon. The first one failed after 48 months. The second, 53 months. The third and fourth failed within a 2-week span at month 55, and the last one failed at 57 months. Fortunately I was using 3-way redundancy but still... not something I expected. I don't know if the serials were sequential but the drives themselves were essentially identical. – MooseBoys Aug 24 '17 at 03:16
  • 1
    @ewwhite while I'm not going to entirely rubbish the idea that enterprise vendors try to insure against this problem, taking a close look at your example image I'm pretty sure those drives are different because they're ordered quite a bit apart. If the batch numbers follow logical formatting I would guess they were manufactured 01/2007, 01/2008, 04/2008 (in order of right to left). These vendors bulk buy and rebrand components from whoever has stock at reasonable prices. – Kaithar Aug 24 '17 at 16:29
  • @Kaithar The point is that the same SKU can be sourced from multiple manufacturers. Same with HP/Dell RAM... Hynix, Samsung, etc. Same part number. – ewwhite Aug 24 '17 at 16:57
  • @MooseBoys I wouldn't say that consumer-grade drives failing after 4-5 years is bad. Rather, that's exactly the life span I'd expect to get out of decent consumer-grade drives. Plan to replace (at least by making sure you have money on hand for replacement drives) a year before the warranty is over; certainly do replace when the warranty expires even if the drives are still working. If the vendor wouldn't trust it any longer, then why should you? (That last is a rhetorical question, by the way.) – user Aug 24 '17 at 17:13
  • 3
    @ewwhite Yes, *but* if you order 10 of the same SKU in one go they're less likely to be from multiple suppliers than if you order them at 1 per month. That's the point I'm making. – Kaithar Aug 24 '17 at 17:15
  • 1
    @smitelli Even rotational HDDs are rated to almost ridiculous acceleration non-operating. Any mishandling during shipping that's bad enough to cause damage to the drive itself should certainly be bad enough to cause visible packaging damage, at which point you should reject the shipment anyway even if it's *probably* benign. – user Aug 24 '17 at 17:16
  • @Michael Kjörling: IMHO a customer-grade drive failing after 4-5 years is bad. I've a 10 year old SSD and it's still working perfect. I also have a 15 year old HDD and it's also still working perfect. Actually all customer-grade drives I've ever had last at least 8 years. I don't trust my drives though, everything is backed up to a Ceph cluster with enterprise-grade drives and I could be up and running again in 1 hour. – wb9688 Aug 25 '17 at 09:22
  • @MichaelKjörling iirc, drives are usually rated for impacts measured in double digit gravities. – Kaithar Aug 26 '17 at 00:18
  • 2
    This answer seems a bit opinionated and doesn't seem to give any argument of why it may be true... have you spoken with all people ordering by Dell? What is "smart" about outsmarting batch failure? Is it actually *good* that people do what you are assuming they do? – AnoE Aug 28 '17 at 05:59
  • This may be true for hard drives sold by the server manufacturer. I really cannot imagine WD or Seagate mixing in some drives from the competitor into they shipments. In any case, mixing procedure should be documented to be binding. I would not rely on the own forensic. – h22 Aug 28 '17 at 13:14
  • @AnoE Opinion formed by experience... The average admin shouldn't spend time worrying about this. If you're building a DIY 60-bay white box storage solution, then maybe... – ewwhite Aug 28 '17 at 20:19
  • @ewwhite: Problem is that awfully often, opinion formed by experience depends a lot not only on the technical stuff, but also on our own (human) predispositions (e.g., our level of OCD ;) ) and on your actual, individual experience. While this is great for many decisions, and I daresay most of them, the OP already seems to have done a lot of research - and has the "knowledge" bit under the belt. He could just go and follow his heart and be done with it; asking about it here invites non-opinionated, factual answers, I'd say. But don't take it to heart too much, that just MY opinion. :) – AnoE Aug 28 '17 at 20:35
  • I agree with almost all of this answer although my take would be that mixing drive vendors and firmware versions within an array is more likely to cause problems than stop them and that's one reason that although large vendors do ship different drive manufacturers under the same SKU it's unlikely they would be mixed in a single order, although replacements may be different if they can no longer source the original drive supplied. I would also say researching the current most reliable drives is much more important and the best data source on that is the backblaze storage reports. – martin81 Aug 31 '17 at 23:01
43

In deference to the answer from ewwhite, some sysadmins do order in batches. I would never, myself, order drives on an individual basis, but standard ops at the last place I worked in such a capacity was to order drives in batches. For a twelve drive machine, SOP dictated that the drives be split into three batches, giving the machine a three tier redundancy profile.

However, other small outfits that I have consulted at have followed different protocols, some not concerned with the batch, and others splitting batches into two or four arrays. The short answer is do what feels appropriate for the level of service you need to achieve.

Side note: The last place I worked was certainly doing the right thing. The app storage machine decided to fail on an entire batch of drives, and we discovered that this particular batch all had the same fault. Had we not followed a batch protocol, we would have suffered a catastrophic loss of data.

Wolfish
  • 539
  • 3
  • 4
39

Honest answer from someone that's spent a lot of time dealing with dying raid arrays and difficult drives: Don't have all your drives from the same batch if you can avoid it.

My experience only applies to spinning disks, SSDs have their own issues and benefits to consider when bulk ordering.

Exactly the best way to handle things depends mostly on how big the array you're working with is, if you're working with something like 6 drive arrays with 2 drive redundancy you can probably safely buy similar drives from 3 manufacturers and split the array like that.

If you're using an odd drive or you're working with arrays that can't be easily partitioned like that you can try other approaches like buying the same drive from different vendors, or if you're buying in bulk you can look through and try to separate the drives based on likelihood of being manufactured together.

If you're running a small enough array with the right underlying tech it might even be worth your time to build it incrementally from heterogeneous disk supplies. Start with the minimum number of drives you can get away with and buy the next supply a month or two later, or when you fill the system. That also let's you get a feel for any issues that there might be with the particular models you picked.

The reason behind this advice is a combination of two quirks of drives.

  1. MTBF is remarkably broken when you have a lot of drives with similar origins. In statistics we'd call it a sampling bias, because of the similarity in your samples the averaging effects will tend to be less useful. If there's a fault with the batch or even with the design itself, and it happens more often than you'd think, then drives from that batch will fail sooner than MTBF would suggest.

    If the drives are spread out, you might get [50%, 90%, 120%, 200%] of MTBF, but if all the drives come from that 50% batch you've got a mess on your hands.

  2. Raid array reassembly kills disks. No, really. If you get a drive failure and the array rebuilds, it's going to put extra load on the other drives while it scans the data off them. If you have a drive close to failure the rebuild may well take it out, or it may already have a failure location that you just weren't aware of because that section hadn't been read recently.

    If you've got a lot of drives from the same batch, the chances of this kind of cascade failure occurring are much higher than the chances if they're different. You can mitigate this by having regular patrol scans, scrubs, resilvering, whatever the recommended practice is for the type of array you're using, but the downside to that is that it will impact performance and can takes hours to complete.

For some context on how wildly the longevity of drives varies, Backblaze do a regular drive failure stat report... I'm not affiliated with the company in any way but they should know what they're talking about on the subject of drive reliability. An example is https://www.backblaze.com/blog/hard-drive-failure-rates-q1-2017/ ... your sample set will likely be smaller, so outlying data can mess up your own experience, it's still a good reference.

Kaithar
  • 1,025
  • 6
  • 10
  • 2
    this should be the acceoted answer. raid with similar (come from same firmware/batch, or bought together and mishandled at some point) disks have a much higher risk of catastrophic failure – Olivier Dulac Aug 24 '17 at 09:18
  • @OlivierDulac and if the disk has a catastrophic design failure as well you life gets really painful. The 300GB/600GB/900GB 2.5" WD Raptor series drives have/had a failure rate that has to be experienced to be believed. – Kaithar Aug 24 '17 at 16:18
  • Referencing Backblaze... excellent. – O. Jones Aug 25 '17 at 11:55
9

I had to consider this issue for a customer a couple years ago. I have a combination of practical experience and research to back up the recommendation to multisource.

Setting aside your pros and cons for the moment, as well as ewwhite's excellent answer, prudence suggests that if you are buying the drives yourself, you multisource them. A quick look at the Wikipedia discussion of RAID weaknesses points to two interesting references.

The first reference is the ACM paper RAID: High-Performance, Reliable Secondary Storage (Chen, Lee, Gibson, Katz and Patterson. ACM Computing Surveys. 26:145-185). In section 3.4.4 the authors point out that hardware failures are not always statistically independent events, and give the reasons why. At the time I am writing this answer, the paper is available online; pp 19-22 discuss reliability (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.3889).

The second reference is Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? (Schroeder, Gibson. 5th USENIX Conference on File and Storage Technologies.) The authors present statistical data to back up the assertion that drive failures may be clustered in time at a rate higher than predicted for independent events. At the time I am writing this answer, this paper is also available online (https://www.usenix.org/legacy/events/fast07/tech/schroeder/schroeder_html/index.html).

Dell explicitly recommended against RAID 5 back in 2012 because of correlated disk failures in large disk environments; RAID 6 is predicted to become unreliable for similar reasons around 2019 (A ZDNet article titled "why-raid-6-stops-working-in-2019": http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/). While a key element of both of these is disk size and rebuild times, smaller drive sizes and multisourcing had been recommended as a mitigator for the RAID 5 issue.

So yes, multisource the drives if you can; if you are buying from an enterprise vendor as described in ewwhite's answer this may be happening for you transparently. However ... my customer bought 16 2TB drives from an enterprise vendor. They just happened to be from the same manufacturer and appeared to be manufactured at the same time. Two of the drives failed within two weeks of configuring the RAID01 arrays. So check the drives when you get them. (You already check them anyway, right?)

Hannah Vernon
  • 185
  • 4
  • 17
Eliodorus
  • 91
  • 1
  • I really don't understand their argument for RAID6 going away due to storage capacity increases. Any RAID array is dependent on good maintenance to function properly. We have very large arrays running RAID6 and have never encountered a URE during a rebuild that caused data loss. Just do scheduled volume checks, like every MFG recommends and you will be fine. – Brian D. May 23 '18 at 19:46
4

Another potential disadvantage to ordering drives individually is packaging and handling.

Hard drives are almost never supplied in retail packaging. If you buy them one at a time they will almost certainly be repacked by the seller. I have found this repackaging to by highly variable. Sometimes you get a nice box with plenty of padding but other times you get hardly any padding at all.

A smaller box is also more vulnerable to being tossed around by carriers without obvious outward damage.

Peter Green
  • 4,056
  • 10
  • 29
3

I always buy used/bulk. Orders I track are almost always same device model, and being used at least mitigates the concern about a "bad batch". There's so much fire-sale hardware floating around the web that I have a hard time justifying buying new drives (or anything else for that matter) unless it's for mission critical hardware (and all our backup hardware is still refurb!)

+PRO: competitive online pricing and the constantly flood of hardware from shifting business environments means it takes almost no effort to get 50-80% off retail for working environment pulls.

+PRO: Price low price frees up budget to over-purchase and maintain a solid back stock of replacement hardware.

+PRO: Seller relations I have a handful of online sellers who I get slight discounts off the already sizeable discount for refurb/used hardware. Not usually going to get that with Monoprice unless you are buying in huge quantity or have an SLA with them. Also, especially with hard drives, just make sure you test them right out the box. I've never had a problem with a seller not refunding or replacing DOA hardware (unless it was a scam I failed to catch).

-CON: Warranty, Legitimacy Issues Warranty is based on manufacture date of device, you're also going to need to keep a lookout for online huksters trying to sell you re-brands, clones, etc.

-CON: Testing Need to factor in overhead of testing. Regardless, you should be testing fresh hardware also so not sure if this applies.

-CON: lifespan difficult to judge; slightly more susceptible to disk failures.

Note: if it's a client build and they don't explicit request refurb/used, always by shiny/new!

merz1v
  • 71
  • 8
  • Totally. I buy a lot of off-lease and remanufactured HP disks because: cheap. Also, HP server warranty tends to cover whatever is _inside_ the chassis, so as long as it's a valid part, it's good. – ewwhite Sep 03 '17 at 13:06
2

It is possible to get more reliability by using hard drives that come from different batches and ideally manufacturers. Otherwise they may fail too close in time. The excellent answer of @Eliodorus explains this enough.

Of course, it does not matter who shuffles the drives. If your provider confirms it does that for you already, no need to care about. However it seems not reasonable to do some forensic on maybe even different provider and conclude somebody does for you if you are not told directly. Providers usually are not lazy to advertise various measures they take to increase reliability of they drives.

h22
  • 234
  • 2
  • 9
2

If you are trying to mitigate the "bad batch" scenario, which means every drive in a particular purchase batch can/will fail near the same time, it is also important to consider the size of the array, and the RAID level being used.

If you consider doing multiple orders, no set standard is applicable across the board. People recommending 2 - 4 purchasing tiers should ask themselves, if one entire tier of drives fail, will the array still be online? So for redundancy RAID levels like 1/5/10/50 you would have to buy drives 1 at a time. For RAID6 you could purchase 2 at a time.

I would recommend regardless of how you purchase the drives that you backup regularly and purchase adequate hot/cold spares for your array size and RAID type.

Brian D.
  • 469
  • 3
  • 11
1

Actually, it depends on the Redundant array of inexpensive discs (Raid) level. In Raid two, three, four, five and six, it does help to have drives from several different batches, but it is not decisive: one already inherently forfeits reliability and performance in using these levels.

Now, for the usually sane choice, that of using Raid 1 (mirroring) or 1+0 (striping over mirrors), it is indeed useful to have different drives on different sides of each mirror (each Raid 1 array), so as to not have the mirror fail during a recovery. Also, there should be hot spares to minimize the recovery window.

For more information, check out the tongue-in-cheek but informative Battle Against Any Raid ‘F’2 (Baarf) Web site, by the prestigious Oak table network of senior DBAs. Wikipedia also sums up the issue nicely.

Leandro
  • 176
  • 1
  • 13
  • This seems to be just opinion. If you have sources, quote and link to them. – MadHatter Aug 23 '17 at 19:20
  • Well, actually I mentioned a source. And I would posit it is much more logic (nature of mirroring against striping and checksumming) than opinion. – Leandro Aug 23 '17 at 19:25
  • 7
    A source which you neither linked to nor quoted; expecting others to google for your source website in order to search the whole thing for supporting data doesn't really make for a convincing answer. As for *it's a matter of logic*, in the *precis* I think we were pretty clear that handwavy *it just makes sense* answers to this particular question were not going to be well-regarded. – MadHatter Aug 23 '17 at 19:29
  • 2
    http://www.baarf.dk/BAARF/RAID5_versus_RAID10.txt – bishop Aug 23 '17 at 19:44
  • Maybe. But I really do think the issue to be far more complicated than I could usually expound here, and that some reading do is in order. – Leandro Aug 23 '17 at 23:32
  • 4
    @lfd the linked to website, while using "logic" to explain its position, doesn't provide data (that I could see from a quick glance). The problem with "logic", is that it just another name for theory in this context. And the problem with untested theories is hopefully clear. Note that untested theories backed up by experts, still have the same problem as untested theories in general. – user2460798 Aug 24 '17 at 17:15
-2

As far as i know the quality checking of disk storage at the factory is pretty high, and i personally would not be afraid of a hardware failure in bulk due to manufacturing reasons.

And if i were slightly paranoid i would just buy storage from two different manufactures that i know dont share factories, through the same vendor.

Storage is so cheap, that it does not make sense as a company to NOT buy in bulk, and you will within the company also write off the storage after a couple of years so the investment is not that great. The time it takes to purchase from individual vendors probably cost more due to time spent.

If you still are afraid of disk failure in bulk, buy more than you need. if you know you need 12 disks, than buy 5 to 7 in spare. That would only be $48 times 5 to 7, per terabyte, and we can still go cheaper without making our system unstable or unsafe because if discount in bulk or second hand disks (why is safe). Than we talk of resilver / re-initializing the array, now i of course have no way of knowing how large your storage solution is now, but if you spend weeks on this task than i would probably consider to reconfigure the organizational storage since this sounds (to me) more as a miss-configuration than anything else in one way or another.

If we than become REALLY paranoid, get 2x of what ever storage solution you are running, based on how sensitive your organisation are to a storage breakdown this could be cheaper, this is not only a option for fortune 500 companies.

And we can also talk about off loading data we dont need here and now, such as (random example) years of historical financial data to "cloud" vendors that we first encrypt. This will remove storage needs from our own storage that will free us up either financially or functionally.

Based on who you are, where you are and what you do their would be different solutions to best work for you.

  • 1
    If you -1 something than you should have the decency to state why. Maybe you are allergic to the truth. – Cristian Matthias Ambæk May 23 '18 at 05:56
  • The issue is not having spares at all, but the probability of a second failure rendering the array unrecoverable before it finishes recovering from a first failure. – Leandro Mar 22 '22 at 00:56