75

Issue

I have read many discussions about storage, and whether SSDs or classic HDDs are better. I am quite confused. HDDs are still quite preferred, but why?

Which is better for active storage? For example for databases, where the disk is active all the time?

About SSD.

Pros.

  • They are quiet.
  • Not mechanical.
  • Fastest.

Cons.

  • More expensive.

Question.

  • When the life cycle for one cell of a SSD is used, what happens then? Is the disk reduced by only this cell and works normally?
  • What is the best filesystem to write? Is ext4 good because it saves to cells consecutively?

About HDD.

Pros.

  • Cheaper.

Cons.

  • In case of mechanical fault, I believe there is usually no way to repair it. (Please confirm.)
  • Slowest, although I think HDD speed is usually sufficient for servers.

Is it just about price? Why are HDDs preferred? And are SSDs really useful for servers?

oldtechaa
  • 103
  • 3
genderbee
  • 901
  • 1
  • 6
  • 10
  • 22
    Spinning rust has still the best price per GB, especially when you need large amounts of storage. By any other metric, power consumption, performance, noise, weight etc. spinning disks are beat by (correctly specced) SSD's and [NVMe](https://en.wikipedia.org/wiki/NVM_Express) storage – HBruijn Oct 04 '19 at 12:05
  • You ask about database servers, but I'd like to be clear. Is this server going to be a database server, or what will be its workload? What are your performance requirements? HDDs are cheap, SSDs are superior for most other considerations. – Rob Pearson Oct 04 '19 at 13:50
  • 1
    I would consider an SSD being "quiet" as a pro. The power supply and cooling fans of a server will be generate far more noise than the drives. – Bert Oct 04 '19 at 14:56
  • 5
    From my POV, except for bulk storage, SSD has replaced HDD in any environment where performance or reliability is a factor. (IE I disagree that HDD is preferred in servers, and replaced HDDs with ssds many years ago in most of the ones I control - and never looked back). Speed of hdd is very important in most server applications. – davidgo Oct 04 '19 at 18:42
  • 9
    Your understanding of SSDs seems incomplete. When a cell dies, its simply marked dead and its contents are remapped. Any filesystem will work fine. SSDs have more cells then advertised (over provisioned) and an abstraction layer so the OS is unaware of this process of moving cells contents. If in a server, use RAID>0 because when SSDs fail they are more likely to do so suddenly and catastrophically. (Although they are about 10 times as robust as HDD) – davidgo Oct 04 '19 at 18:47
  • In practice, the way you will use HDDs (Hardware RAID controller, with big battery or flash backed cache) in most any but the smallest servers actually presents itself as more akin to a hybrid drive than a classic HDD anyway.... – rackandboneman Oct 04 '19 at 20:06
  • Please clarify if you intend on using domestic-grade SSD and HDD, or if you're talkng about server-grade SAS SSD and HDD, or PCIe connected NVME drives. – Criggie Oct 05 '19 at 08:08
  • HDD seek times are on the order of 10ms—that timing is not always negligible in large datacenters. – D. Ben Knoble Oct 05 '19 at 23:53
  • My only experience with an HDD in a server is a few sectors getting corrupted while I was on vacation, the OS write-locking it, and having to shut it down remotely and spend a week getting a new SSD put in and the data transferred, and that SSD hasn't failed me yet. – Radvylf Programs Oct 06 '19 at 14:39
  • @RedwolfPrograms SSD's generally aren't as time-limited as HDD's are, they're performance-limited (meaning they can only do "so many" operations before they begin to die). If your server doesn't do a lot of writing, it may last the rest of the server's life. :) – Der Kommissar Oct 07 '19 at 13:30
  • The question is too broad. – Overmind Oct 14 '19 at 12:55
  • an empirical datapoint: I work for a large data center company. We only use SSDs. – Harvey Oct 16 '19 at 14:23
  • @Harvey and others do you have any datapoints on the GBs at which hdd has consumed more power rotating than SSDs? (Depends on access pattern but asking a general answer) – 0fnt Oct 17 '19 at 06:35
  • @0fnt Only thing I can say on that is we replaced a full rack of HDD's at one customer for SSD's (total space dropped 70% but we planned for that), power consumption on that rack didn't change one bit. Not sure where the "break-even" is. – Der Kommissar Oct 17 '19 at 15:40
  • The type of SSD matters here too. Enterprise drives last longer than consumer drives. They handle a lot more write capacity. A QLC drive is completely useless for server use both because of speed at high writes and the supported write capacity. It's a joke. If you're using windows and the cloud, a QLC might last years on a desktop. It's going to fail badly in a server. I get 2-4 years out of consumer Intel SSDs as boot drives, ZFS read cache, or write heavy loads. That's a home server mind you and MLC/TLC era drives. – Lucas Holt Oct 17 '19 at 16:52

13 Answers13

98

One aspect of my job is designing and building large-scale storage systems (often known as "SANs", or "Storage Area Networks"). Typically, we use a tiered approach with SSD's and HDD's combined.

That said, each one has specific benefits.

  1. SSD's almost always have a higher Cost-per-Byte. I can get 10k SAS 4kn HDD's with a cost-per-gigabyte of $0.068/GB USD. That means for roughly $280 I can get a 4TB drive. SSD's on the other hand typically have a cost-per-gigabyte in the 10's and 20's of cents, even as high as dollars-per-gigabyte.

  2. When dealing with RAID, speed becomes less important, and instead size and reliability matter much more. I can build a 12TB N+2 RAID system with HDD's far cheaper than SSD's. This is mostly due to point 1.

  3. When dealt with properly, HDD's are extremely cheap to replace and maintain. Because the cost-per-byte is lower, replacing an HDD with another due to failure is cheaper. And, because HDD failures are typically related to time vs. data-written, replacing it doesn't automatically start using up TBW when it rebuilds the RAID array. (Granted, TBW percentage used for a rebuild is tiny overall, but the point stands.)

  4. The SSD market is relatively complex. There are four (current, at the time of this writing) major types of SSD's, rated from highest number of total writes supported to lowest: SLC, MLC, TLC, QLC. The SLC typically supports the largest numbers of total writes (the major limiting factor of SSD lifetimes), whereas the QLC typically supports the lowest numbers of total writes.

That said, the most successful storage systems I've seen are tiered with both drives in use. Personally, all the storage systems I recommend to clients generally follow the following tiers:

  1. Tier 1 is typically a (or several) RAID 10 SSD-only tier. Data is always written to Tier 1.
  2. Tier 2 is typically a (or several) RAID 50 or 5 SSD-only tier. Data is aged out of Tier 1 to Tier 2.
  3. Tier 3 is typically a (or several) RAID 10 HDD-only tier. Data is aged out of Tier 2 to Tier 3.
  4. Tier 4 is typically several groups of RAID 6 HDD-only tiers. Data is aged out of Tier 3 to Tier 4. We make the RAID 6 groups as small as possible, so that there is a maximal support of drive-failure.

Read/Write performance drops as you increase tiers, data will propagate down to a tier where most of the data shares the same access-/modification-frequency. (That is, the more frequently data is read/written, the higher the tier it resides on.)

Sprinkle some well-designed fibre-channel in there, and you can actually build a SAN that has a higher throughput than on-board drives would.

Now, to some specific items you mention:

Your SSD Questions

How SSD exactly works, when life cycle for one cell is out, what then? Disk is reduced by only this cell and works normally? Or what happened then?

  • Both drive-types are typically designed with a number of "spare" cells. That is, they have "extra" space on them you cannot access that supports failing-to if a cell dies. (IIRC it's like 7-10%.) This means if a single "cell" (sector on HDD) dies, a "spare" is used. You can check the status of this via the S.M.A.R.T. diagnostics utility on both drives.

What is best solution (filesystem) to write? I think ext4 is good, because it saves to cells consecutively?

  • For SSD's this is entirely irrelevant. Cell-positioning does not matter, as access time is typically linear.

Your HDD Questions

In case of mechanical fault, no way to repair it (is it right)?

  • Partially incorrect. HDD's are actually easier to recover data from in most failure situations. (Note: I said easier, not easy.) There is specialized equipment required, but success-rates here seem pretty high. The platters can often be read out of the HDD itself by special equipment, which allows data-recovery if the drive is dead.

Slowest, but I think speed is not so important, because speed of HDD is absolutely sufficient for server using?

  • Typically, when using RAID, single-drive speed becomes less a factor as you can use speed-pairing RAID setups that allow you to increase the overall speed. (RAID 0, 5, 6 are frequently used, often in tandem.) For a database with high IO's, HDD's are typically not sufficient unless designed very deliberately. You would want SLC write-intensive grade SSD's for database-grade IO.
jcaron
  • 985
  • 6
  • 9
Der Kommissar
  • 1,143
  • 7
  • 9
  • 1
    HDD's have much higher power consumption. – Michał Leon Oct 05 '19 at 13:12
  • When you say that the filesystem is entirely irrelevant for SSDs, you are assuming SSDs which do their own wear levelling (which should be most of them nowadays), right? Otherwise, filesystems which have many writes to the same blocks would quickly churn through the spare sectors of the SSD. – Jonas Schäfer Oct 05 '19 at 16:08
  • 2
    @JonasSchäfer nowadays pretty much anything has it's own wear-levelling as long as it has a controller. Small embedded devices tend to use SLC NAND attached directly to the SoC which has a built-in controller. Those often use UBI which is basically a flash-focused LVM with built-in wear levelling (and used to do wear levelling cross-filesystem). – jaskij Oct 05 '19 at 16:43
  • 1
    @JonasSchäfer: AFAIK you can't buy a SATA or SAS SSD *without* wear leveling. (Intel Optane SSDs internally use 3D XPoint instead of NAND flash, which has much higher write endurance so might not need it.) If you're going to have a controller anyway (SAS or SATA, or even NVMe) that handles erase before write transparently, you're also going to build in wear leveling. – Peter Cordes Oct 06 '19 at 00:01
  • @DerKomissar: You didn't mention Intel's Optane DC non-flash SSD: they're even faster than than SLC, with much higher write endurance. Also, for database workloads there's Optane DC Persistent Memory: a DIMM made of 3D XPoint memory that is physically accessible of the DDR4 bus, and exposes the storage as a range of physical addresses that can be memory-mapped into a user-space process for I/O without the overhead of going through the kernel. (Commit to storage with instructions like `clflush` to flush a cache-line). This stuff is all pretty new, but CPUs that support it are available now. – Peter Cordes Oct 06 '19 at 00:09
  • 3
    Regarding SSD types - SLC, MLC, TLC, QLC are now joined by PLC ([ref](https://arstechnica.com/gadgets/2019/09/new-intel-toshiba-ssd-technologies-squeeze-more-bits-into-each-cell/)) – Jonathan Oct 06 '19 at 12:02
  • @PeterCordes Optane technology is not SAN-grade, I would not (yet) trust it to reliably hold data for any of my clients (on top of the other technical limitations of it). – Der Kommissar Oct 07 '19 at 13:28
  • @Jonathan Saw the white-paper, it looks like a solid technology, though it's not mainstream or proven enough yet for SAN-grade storage (in my professional opinion). – Der Kommissar Oct 07 '19 at 13:29
  • With regard to your tiered storage idea - is such a thing possible on the block level? Say I want to set up something like that in my home server on - for instance - FreeNAS, could it be implemented in a way that's transparent to applications other than the filesystem driver? – Adam Barnes Oct 07 '19 at 19:31
  • @AdamBarnes I don't know about FreeNAS, but yes. I do them at the block level with specialized hardware, so that none of my OS's even need to care. Mine are all SAN systems so I'm typically exposing LUN's over iSCSI / Fibre Channel. – Der Kommissar Oct 07 '19 at 19:55
  • @MichałLeon That's no longer correct. With the write-leveling and other internal features of the SSD, the watts-per-gigabyte rating is actually often _higher_ during use, and _slightly lower_ during idle times than HDD's. See also: https://www.tomshardware.com/reviews/ssd-hdd-battery,1955.html – Der Kommissar Oct 07 '19 at 21:23
  • *Retrieving data* from a failed hard drive can usually be done. *Repairing* the failed hard drive so the hardware is actually usable again? Not so much. – Ross Presser Oct 16 '19 at 18:39
  • A SAN would only have higher throughput compared to locally attached because you are spending orders of magnitude more. It's not the SAN architecture that's faster, but the Tier-1 configuration, and expensive low-latency controllers. If there is one machine connected to such a SAN, it would be fast, but if you connect 100k machines (for an extreme example) it would choke. Why can't each machine have local Tier-1 with less total storage, then fibre channel for the rest? – Kind Contributor Oct 17 '19 at 01:39
  • @Todd That's entirely incorrect. The purpose of a SAN is a single, central point of authority. They're used when local storage is _inappropriate_. Trying to do Tier-1 on local means you have to _copy_ that Tier 1 to all other locals. (Think Virtualization infrastructure: the SAN storage is where all my VM data is, if I store locally, I have to copy _all_ that storage when I migrate VM's.) And the 100Gbps fiber links I use for SAN design allow 10x the throughput of local SAS. We get _sustained_ writes of 10GB/s at one of my customers with 70 servers connected to 2 SAN's. – Der Kommissar Oct 17 '19 at 13:05
  • .... but ease of data recovery from a failed drive should not be an issue, because we always make regular backups of any data we wouldn't want to lose, right? :) – Jeremy Friesner Oct 17 '19 at 17:23
  • 2
    @JeremyFriesner Usually, the problem comes in if it's a backup storage drive, or if something wasn't important enough to be backed up, or someone forgot. Things happen, we're only human. I watched a fresh hard-drive get dropped in a copy-device, the old drive died during the copy, and it was only half done. We had one option: data-recovery. – Der Kommissar Oct 17 '19 at 18:54
  • Good point. If your objective is to have clustered services (including VMs), you use a SAN, to do so. As a software architect, I scale with micro-servers and application layer server selection. No need for clustering, or load balancers - no centralised architecture, no PaaS lock-in. We are in two different worlds, thanks for the reminder of the realities of enterprise VM infrastructure hosting. – Kind Contributor Oct 18 '19 at 05:33
18

HDD is still quite preferred

Is it? I'm not sure it is to be honest.

HDD's come in large sizes for a decent price right now, that's undeniable, and I think people trust them for longer data retention than SSDs too. Also when SSDs die they tend to die completely, all in one go, whereas HDDs tend to die in a more predictable way that maybe allows more time to get data off first if needed.

But otherwise SSD is the way forward for most uses - you want a boot-pair, a couple of 500GB SATAs in R1 won't cost the earth, for DB use you can't really beat SSDs (so long as your logs are on high-endurance models anyway). For backups yeah you might use big 7.2k HDDs, same for very large datasets (in fact I bought over 4,000 10TB HDDs early last year for just this requirement), but otherwise SSD is the way forward.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • So SSD are just trendy now? Isn't it just fancy word for this times? Because some vps providers have only SSD, so price is higher. And I understand correctly, 1 cell dead = all disk dead? – genderbee Oct 04 '19 at 12:30
  • 1
    "1 cell dead = all disk dead" - no, far from it, but when they die properly they tend to go down in one go. – Chopper3 Oct 04 '19 at 15:27
  • 4
    SSD are around 100x faster or more than HDD. Trendy i s a funny thing. YOu mention databases -that is the difference between "overloaded" and "no measurable load". Also you ignore having HDD with SSD write back buffers ;) – TomTom Oct 04 '19 at 18:24
  • 5
    I suspect the VPS providers were finding with HDDs that they ran out of IOPs before they ran out of space. – Peter Green Oct 04 '19 at 21:52
7

Solid state for everything hot: interactive use, databases, anything online. Spindles as cheap warm storage, only for not-quite-cold archives or infrequently accessed data. In particular, HDDs in a staging area before backups are archived to tape.

Different media types for hot versus cold also helps with some diversity. A data loss flaw in a brand of SSD controller would be much worse if it took out both online and backup data. Unlikely, but spindles and tape are cheap anyway so why take the risk.

The failure mode of any particular device is not important, as long as the arrays stay redundant and backed up. Usually the procedure is to replace a drive with any symptoms of failure. Experiment with repairing them in your test systems, where any catastrophic failure does not impact production services.

File system is a matter of personal preference. While there are SSD optimized file systems, something you know and can repair may be more important.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32
6

The big advantage of an SSD is speed and reliability however, one of the dirty little secrets is the limited number of write cycles that an SSD has. If you are building a server that has a lot of hard drive write activity like a database or email server you will need a more expensive SSD that has higher endurance.

NAND Flash has 3 types

  • TLC
  • MLC
  • SLC

TLC is mainly designed for web servers or archive servers that have little write cycles. MLC is for servers that have a mix of read and write cycles like a low volume database servers. SLC is designed for servers that have a lot of read/write cycles like a high volume database server.

The main driving factor between SSD and HDD is application and budget. In a perfect world, SLC SSD hard drives would make a standard HDD obsolete but we are just not there yet.

Joe
  • 1,175
  • 1
  • 8
  • 11
  • There's also a NAND tech called QLC (Quad vs Triple in TLC). At that point though, you're sacrificing endurance for more/cheaper storage. – Havegooda Oct 04 '19 at 14:22
  • @Havegooda: There's also non-flash solid-state storage, notably Intel's Optane DC SSDs that use 3D XPoint (phase-change memory). *Excellent* write endurance, and faster than even SLC flash. – Peter Cordes Oct 06 '19 at 15:07
4

HDD is still quite preferred, but why?

That depends on who you talk to, their background (management, IT, sales, etc), and what type of server the discussion is in reference to. HDDs are generally an order of magnitude less expensive per byte, but use more power and are almost always slower, workload dependent.

Almost always it comes down to cost and how much storage can be fit into a given amount of servers. If you can get the performance of a 5-disk raid array with a single SSD, the SSD is probably a lot less expensive and uses a fraction of the power, but you will also get maybe 1/10 the storage.

Which is better for active storage?

This is where it gets complicated, and why many people will skip the complication and just go with the HDDs they know.

SSDs come in different grades with limits on how much data can be written to the cells, which is NOT the same as the amount of data written by the host. Writing small amounts of data end up writing large amounts to the cells, this is called write amplification, and can quickly kill drives with low endurance ratings.

SSD cells are named for the amount of bits they can store, in order to store n-bits, they need 2^n voltage levels per cell. A TLC (triple bit) needs 8 voltage levels to address those bits. Generally, each time you increase the level of bits per cell, you get a 3-10X drop in cell durability. For example, an SLC drive may write all cells 100000 times before the cells die, enterprise eMLC 30000 times, MLC 10000, TLC 5000, QLC 1000.

There are also generational improvements in SSD cell technology, better lithography and 3D NAND improve density and performance over older 2D NAND, "Today's MLC is better than yesterday's SLC", as quoted by analyst Jim Handy.

SSDs do not actually write directly to addressed cells, they write to blocks of cells. This way the block has a more consistent amount of cell writes, and when cells drop out of tolerance the entire block is marked bad, and the data is moved to a new block. SSD endurance is based on the cell type, how many spare blocks are available, how much overhead for error correction, and how the drive uses caching and algorithms to reduce write amplification. The tolerance the manufacturer selects to mark bad also comes into play, an enterprise drive will mark blocks bad earlier than a consumer drive, even though either one is still fully functional.

Enterprise grade "high-write" SSDs are based on SLC or eMLC cells and have large amounts of spare blocks, and usually have a large cache with capacitors to make sure the cache can flush to disk when power is lost.

There are also drives with much lower endurance for "high-read" applications like file servers that need fast access times, they cost less per byte at the price of reduced endurance, with different cell types, less spare area, and so on, they may have only 5% of the endurance of a "high-write" drive, but they also do not need it when used correctly.

For example for database, where disk is active all time?

My database is small, with intermittent reads being 95% of access, and most of it is cached in RAM, it is almost as fast on a HDD as on SSD. If it was larger, there would not be enough RAM on the system, and the SSD starts to make a huge difference in access times.

SSDs also make backups and recovery orders of magnitude faster. My DB restored from backup in about 10 minutes to a slow SSD, or about 11 seconds to a really fast one, backup to a HDD would have been about 25 minutes. That is at least 2 orders of magnitude, and that can make a huge difference depending on workload. It can literally pay for itself on day 1.

Databases with huge amounts of small writes can murder a consumer grade TLC drive in a matter of hours.

And are SSD really useful for server?

Absolutely, if the correct drive type and grade are selected for the application, if you do it wrong it can be a disaster.

My server runs several databases, plus high-read network storage, plus high-write security footage storage, plus mixed read write file storage and client backup. The server has a RAID-6 array of HDDs for the bulk network storage and NVR, a single high-performance MLC SSD for MySQL, and 3 consumer TLC drives in RAID-5 for client and database backups and fast access network storage.

Write speed on the SSD RAID is about the same speed as the HDD RAID, but random access read speed is more than 10X faster on the SSD RAID. Once again this is a consumer TLC SSD, but since the sequential write speed is about 3X faster than the gigabit LAN, it is never overloaded, and there is plenty of overhead if the system does local backups when it is being accessed remotely.

Most SSDs also offer instant secure erase (ISE), which can wipe the data in a few seconds, versus many hours or days for HDDs that do not have that feature, only a few enterprise grade HDDs tend to offer ISE, but they are becoming more common. This is very useful if you are retiring or re-purposing a drive.

What is best solution (filesystem) to write?

Depends on the type of data and the types of filesystem features you want. I am only using EXT4 and BTRFS (need snapshots and checksums). Filesystem overhead will decrease usable space and can slightly reduce the life of SSDs, BTRFS has high overhead for checksums and other features, and snapshots will use a lot of space.

In case of mechanical fault, no way to repair it (is it right)?

Regardless of drive type, have you ever had to have data recovery done on a dead drive? It can be very expensive, you are better off having a tiered backup, RAID on main storage, versioned backups locally on a different device or machine, then sync to offsite or cloud. 1TB of cloud storage is $5 per month, data recovery on a HDD can cost you 2 grand, and a dead SSD may be impossible to recover... just do the backups and forget about repair.

Richie Frame
  • 201
  • 1
  • 3
2

BOTH.

I have yet to see an SSD dying because of the write load (they are supposed to become readonly in this case). Not that they don't die for other reasons - including, but not limited to overheating and firmware bugs.

And I have seen a dead HDD. A lot more of them, actually.

So much about the reliability.

In some cases it makes sense to make mixed RAID1 (HDD + SSD). This way you can hedge for the failure modes related to both of them and still have SSD read performance.

In other cases it makes sense to use an SSD for the filesystem's journal only - you'll get 2x the write performance of the HDD (because you save half of the writes and half of the seeks) and generally no risk even if your abused SSD dies. Ext4 loses it's journal pretty gracefully.

fraxinus
  • 524
  • 2
  • 5
  • Many FSes only journal metadata, e.g. ext4 with the default `data=ordered`. You don't "save half the writes" unless your workload only involves renaming and deleting files/directories, and creating empty files. But yes, journal on SSD should most workloads significantly by removing lots of small writes. – Peter Cordes Oct 06 '19 at 15:14
  • SSDs do not go read only. They self destruct by design. It's true they go read only until the power is turned off. If you reboot after a fail quick and do a read while the system is on, you can copy the data off (maybe). If you turn it off, it's dead. – Lucas Holt Oct 17 '19 at 16:48
  • I am sure I have read the read-only thing in more than one SSD docs. So I assumed it was reasonable. In reality, I have seen it only twice in USB flash sticks. No SSD became read-only in my hands. – fraxinus Oct 18 '19 at 06:52
2

The two main factors to consider are:

  • Performance (in access time and throughput)
  • Cost per gigabyte

SSDs blow HDDs out of the water in terms of performance. If you need high throughput and low access times, nothing beats SSDs.

But the cost per gigabyte of SSDs is much higher than that of HDDs. If you need a lot of storage and throughput or access times are less important, nothing beats HDDs.

Throughput (bandwidth) figures may be helped by the appropriate RAID level (not so much access times, though, unless your drives are backlogged enough that queuing is an issue).

Read access time figures for small datasets may be helped by appropriate caching (i.e. put more RAM in your server). Won't help for writes, though (with the exception of battery-backed RAM caches in controllers or disks).

So it all really depends on your use case. A backup/archive server which needs a lot of capacity but doesn't care much about access times or bandwidth will be better off using HDDs. A high-traffic database server will prefer SSDs. In between... depends.

Whatever the situation:

  • You need backups. It's not a matter of if a drive (SSD or HDD) will fail, it's a matter of when.

  • If the server has any kind of importance, you want some kind of RAID to maintain uptime and protect data. RAID will also usually help with performance. Which depends a lot on your requirements (again, a performance/cost compromise).

jcaron
  • 985
  • 6
  • 9
2

As already mentioned, the big difference is price per GB vs random IO performance.

Take, for example, a Seagate Exos 16 TB: at ~550$, it commands 0,034$/GB. Now compare it with with an entry-level (speed wise) Micron 5200 ECO 7.68 TB priced at ~1300$, with a resulting 0,14$/GB ratio: the HDD is 5x cheaper, while being 2x bigger also. On the other side, SSD random IO performance are wastly better, with a catch: consumer SSDs, lacking powerloss-protected writeback cache, are quite slow (sometime as slow as HDD) for synchronized random IO rich workload (eg: databases, virtual machines). This is a very important point, rarely analyzed by online reviews. Enterprise SSDs, with almost univesally use capacitors as power-loss protection, do not suffer from this weakness, having very high read and write random IO.

From the above, you can understand why SSD have killed the high-end 15K and 10K SAS disks: they provides much better performance at a comparable cost (15K disks were especially expensive). On the other hand, 7.2K HDD have a very strong foothold in high capacity storage systems.

Intel Optane (which is based on Xpoint rather than NAND) is in a class of its own both in speed and durability, commanding a very high price/GB: a 100 GB Optane P4801x costs over 260$, with a per-GB cost of > 2.6$, 80x more when compared to HDDs. For this reason, it is often used as an "application accelerator", or as a log/journal device.

For these reasons, modern SANs and server often used a tiered or cached storage subsystem:

  • tiered systems put hot data in the fast tier (SSDs) and cold data in the slow tier (HDDs). In such systems, the total storage space is the sum of the fast and slow tier; however, they are staticall partitioned - if a cold data suddenly become hot, you need to wait for it to being moved to the fast tier. Moreover, the fast tier must be as much as durable than the slow one;

  • cache-based system have all data on slow HDD, augmented with a dynamic cache on SSD where hot data are copied (rather than moved); this means that such systems have a total storage space equal to what the slow tier offers, but with the added flexibility of a dynamic cache. With cache-based systems, the fast tier can be formed by inexpensive, cheap SSDs.

What is the best filesystem for a flash-based SSDs? A naive answer can be "the one which writes less", but the reality is that any advanced filesystem tech is based on a CoW approach which, based on the specific implementation, can led to a quite substantial write amplification (ie: ZFS and WALF are going to write more than, say, EXT4 or XFS). For a pure "write-less" standpoint, I think that it is difficult to beat EXT4 and XFS (especially when backed by lvmthin, which enables fast snapshots even on these classical filesystems); however, I really like the added data protection guarantee and lz4 compression grated by ZFS.

So, do you really need an SSD storage for your server duties? It depends:

  • if you need to cheaply store multiple TBs of data, HDDs (or at most cheap consumer SSDs) is the way to go;

  • if you have a mostly sequential workload (eg: fileserver), you don't need SSDs;

  • if your workload is random IO rich, you will greatly benefit from SSDs;

  • if you have an fsync-heavy write pattern, enterprise SSDs (or a beefy RAID controller with powerloss-protected writeback cache) are your best bet, with the downside of high cost.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
1

Simple answer here : Use SSDs for fast perfomance data for eg, when building a server to do large and quick data operation (like video editing)

Use HHD's for slow archival storage.

Generally HDDs are less reliable than SSDs even though they have a lower cost per gig than SSDs.

if sensitive data is being stored, consider using a ssd and also a hdd for backup.

1

Quiet isn't always good. Like electric cars on the road being too quiet. HDD access noises can provide security ( how I detected a break in to a work perforce server while watching a movie. (Addition: line feed printers linked to /var/log/messages are harder to erase a single entry)

1

I look at it like this,

What is the service I am building server for?

If it's an infrastructure service like LDAP/AUTH/Printing etc whereby you are offering a service, it's mainly a memory issue save money and use HDD (7.2k or 10k maybe a raid 1 SSD boot device) and throw a load of memory at it.

Make sure you use a battery backed flash raid controller for file server, you can then use HDD efficiently by the write being committed by the controller and not the disks.

If it's a data service DB etc then use SSD raid for high throughput but control the costs by using HDD too, some DB's for example will not require a high write speed or or not just running the IOP's to warrant the use of high cost storage.

At the end of the day it's down to money and your CFO/Finance Director/VP finance.

1

SSDs are clearly the best, they will get better, and will continue to get cheaper, but they are more expensive today.

HDDs are fine for sequential storage tasks:

  • Database Log file
  • Video storage
  • Backup volumes (bulk)
  • Virtual Machine snapshots

HDDs are also fine for latency insensitive tasks:

  • Archiving files (individually)
  • Small databases that are small enough to be running in memory anyway
  • Non-OS software files (if your SSD is getting full)

So for a server, if you have the budget, you can fill it with SSDs. Beyond that, using the incomplete list above, you can save money by mixing with HDDs.

RAID, and Tiering is beyond the scope of this question, I'm sure there are plenty of other questions about that.

As for the lifecycle of SSDs, (I remember reading the Samsung Evo Pro (consumer product) has lasted a lot longer than promised). Individual cells can certainly break over time, but that doesn't break the entire disk. Cell lifetime is linked to the amount of writes. on that cell. The SSD controller spreads writes over multiple cells over time. If the SSD is 99% full, and the remaining space is used with lots of writes, that remaining space will be worn out faster.

1

If there is a need for

  1. File based swap mounted for additional memory
  2. Video chat or video streaming or video processing
  3. Processing which leads to big a single file

then HDD is more reliable
The overwriting seems to be slower in SSD

SSD is amazing though!
it made the revolution of physical storage of exabytes/yotabytes in one small cabinet/rack

A big nitrogen cooler can be installed and a small space can serve a pure storage rack

SSD Cache is another amazing faster read technology which enables caching to another level

Dickens A S
  • 111
  • 3