15

We are acquiring new Lenovo SR650 server (which will be hosting multiple Oracle DB servers,SAP) & following storage options are proposed from the vendors

  • ThinkSystem 2.5" 1.2TB 10K SAS 12Gb Hot Swap 512n HDD (QTY: 16 disks)
  • ThinkSystem 2.5" 5210 960GB Entry SATA 6Gb Hot Swap QLC SSD (QTY: 16 disks)
  • ThinkSystem 2.5" 5210 1.92TB Entry SATA 6Gb Hot Swap QLC SSD (QTY: 16 disks)

We read somewhere that upon sudden power failure, there are more chances of total data corruption as compared to SAS disks.

What is more suitable storage option from above from performance & reliability perspective?

We have redundant UPS , along with dedicated online generator for Data Center.Initially we will be hosting 2 SAP servers (Production & Development). Both are virtualized. Each VM space usage is around 3 TB. In the past, our experience with Raid 5 is not good, & we are using RAID 10 in all of our servers, and after RAID10 , we have not encountered any failure from past few years.

Is this a good idea to break 16 disks into TWO Raid 10 arrays? PRD on 1st array, and DEV on 2nd array, So that whatever operation (data copy, Backup etc) is in progress, it should not affect second array?

iBBi
  • 377
  • 1
  • 3
  • 14
  • 6
    You may want to inquire about U2 (NVMe) disks. The fact that the vendor suggests 16 disks regardless of size might mean that you're IOPS-limited, and U2 can deliver far more IOPS per disk than even a SATA SSD. – MSalters Mar 02 '20 at 13:17
  • 3
    Note that a lot of times, "entry" level SSDs are merely re-badged consumer-grade devices. I don't know what storage controller you're using, but you may find yourself unable to use some of its features if they require some of the more enterprise-oriented features not generally present on consumer drives. – bta Mar 03 '20 at 17:56

5 Answers5

28

QLC SSDs are absolutely inadequate for write heavy workload as databases and SAP. I strongly suggest you to buy enterprise-grade TLC disks, as Samsung PM/SM863 and Intel S4510/S4610.

I would not go the SAS 10k route unless the SSD system cost too much for your budget.

Finally, I would keep all disks in the same RAID10 array so that production workloads can benefit from all the 16 disks IOPS.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • 1
    PM863 and S4510 are also not for write workload. – batistuta09 Mar 02 '20 at 13:30
  • 1
    While not specifically marketed for write-heavy workloads they provides plenty of endurance, especially when compared to QLD drives – shodanshok Mar 02 '20 at 14:26
  • 2
    They are. Entry level discs often have 40-60gb per day write budget. Enough for an OS etc. The PM863 is rated for a LOT more. I see the 1.92tb version arted for 2800tb writes. Not necessarily extreme heavy - but definitely better. – TomTom Mar 02 '20 at 15:56
  • 1
    I didn't mean PM863 is a "bad" device, but for "write heavy workload" I'd prefer SM883 or S4610. – batistuta09 Mar 02 '20 at 16:29
  • "Write-heavy workloads" would be e.g. cache disks for a large HDD array. In that role, they're serving the full write load of the underlying array. But an SM863 can handle >1 full drive write/day. That would be 15000 Gigabyte a day for the stated 16 disk system. How are you going to get all that data into that server? That's more than a Gigabit per second, _sustained_. – MSalters Mar 02 '20 at 16:32
  • 6
    @MSalters is is not only an endurance problem, rather mainly an IOPs issue. Current consumer SSDs provide very low synchronized write (read: `fsync()`) performance, resulting in lower than expected results when facing fsync-rich workload (as an write-heavy SQL database). Moreover, in such circumstances write amplification will skyrocket, leading to reduced SSD lifespan. – shodanshok Mar 02 '20 at 16:58
  • @MSalters The problem might be more that some consumer controllers can't handle sustained writes very well and usually also have really low spare space. This can lead to surprising performance hits as well as skyrocketing write amplification. I'd like to think that this was mostly a problem with older controllers, haven't checked it for a long time. – Voo Mar 03 '20 at 13:47
10

Always go flash, if you can of course. QLC has some crazy low endurance so watch out spare cells usage and be prepared to swap drives as they die like crazy - keep some in stock and maybe do it proactively. You’ll be fine :)

BaronSamedi1958
  • 12,510
  • 1
  • 20
  • 46
8

In terms of raw speed, the SSD options in the question will grossly out-perform the SAS drives. It's embarrassing, really. Nevertheless, don't use the QLC disks! You can use consumer SSDs, but look for disks using TLC or better.*

Additionally, you need to be careful using consumer SSDs to build RAID volumes. Modern SSDs have internal controllers that lie to the OS and RAID controllers, and will claim to have fully committed data when this is not actually the case! There are good reasons for this in the desktop systems where these drives are intended to be installed, but in the event of a power failure it can lead to significant data loss in a server RAID/SAN volume, because data the OS thought was committed was still in volatile cache within the disk and suddenly the check-bit for the whole stripe is off.

Enterprise SSDs avoid this issue with a small internal capacitor able to provide enough power to finish committing anything still in a volatile buffer if the power drops. It's a $2 manufacturing addition, but it can triple (or more) the price of the drive :(

You may also be able to address this issue by ensuring you have a RAID controller with it's own battery unit, or if you otherwise have very high confidence in the power situation for your data center and your backups.

With that in mind, I see this:

We have redundant UPS, along with dedicated online generator for Data Center.

That's a start. What I'd like to see on top of this is a documented history for this data center proving UPS batteries are replaced on schedule, the generator is actually maintained and powered up once a quarter, and the data center has survived previous power issues without unexpected server drops. If you have or can get this documentation, you should feel comfortable using (non-QLC) consumer SSDs in your servers.


* Note: QLC has eventual potential to exceed TLC endurance, but that's not what's on the market today. As such, this post may not age very well, and future readers should do additional research.

Joel Coel
  • 12,910
  • 13
  • 61
  • 99
  • 4
    Any consumer SSD from respectable vendor will correctly honor sync/flushes/FUAs. This means that important writes (ie: synced one), once reported as completed, are going to be really stored on safe storage **if** the RAID card correctly passes down sync/flush requests. This is not a problem when using an AHCI or IT controller; however some hardware RAID controller, relying on its own powerloss-protected writeback cache, will *not* pass down sync/flushes. In these cases the private SSD DRAM cache *has to be disabled*, leading to much lower performance. – shodanshok Mar 02 '20 at 16:05
  • On the other side, the real plus of enterprise, capacitory backed SSDs is that they can *avoid* honoring sync/barriers/flushes/FUAs, bringing much higher performance to the table. – shodanshok Mar 02 '20 at 16:05
  • 1
    @shodanshok: Why would SSD DRAM cache need to be disabled? The hardware RAID controller tells the OS when the data is safe from powerloss. I assume the RAID controller is smart enough to keep the data in it's own cache until the SSD completed its write, too. – MSalters Mar 02 '20 at 16:37
  • 2
    @MSalters this assumes the RAID controller passes down the required sync/barrier to the SSD (otherwise, it can not be "sure" about the SSD completing the required write). *Some* controllers *seem* doing that, but others simply discards any sync/barrier information. In the end, hardware RAID controller are poorly documented black boxes which one has to trust doing the right thing. For this reason, I am a great fan of open source RAIDs, especially ZFS-based. Give a look [here](https://serverfault.com/a/685328/269155) and [here](https://serverfault.com/a/1005322/269155) for more details. – shodanshok Mar 02 '20 at 16:53
  • Consumer SSDs have the nasty habit of failing BRICK HARD once their spare sector pool is exhausted. Any SSD based on the same tech should be viewed with suspicion unless the vendor says they have implemented a defined and usable behaviour in that eventuality. – rackandboneman Mar 02 '20 at 23:39
  • @shodanshok I'd love to see your answer extended to discuss these issues. – Joel Coel Mar 03 '20 at 17:14
  • 1
    @JoelCoel you can read something [here](https://serverfault.com/a/685328/269155) and especially [here](https://serverfault.com/a/1005322/269155) – shodanshok Mar 03 '20 at 17:26
  • @shodanshok So reading that, to check my understanding, if you have, say, a perc h710 you should still be fine with consumer ssds? – Joel Coel Mar 03 '20 at 21:46
  • 1
    It *should* be fine, based on some lab testing I did recently. Moreover Dell PERC controllers *automatically* enabled the disk private cache when using SATA disks. However, no explicit statement about safety with enabled cache exists from Dell or LSI as far I know. On the other hand, some older Intel and 3rdy party docs warn about enabling disk cache (giving no further explanation). My personal opinion is that with the advent of consumer SSDs, which *requires* a writeback cache to give reasonable performance, hardware RAID cards become smarter about passing down cache flushes. – shodanshok Mar 03 '20 at 22:15
3

We use a very similar setup of SAP systems currently, with an additional QAS server.

As primary storage we use a Dell Compellent SSD solution, with LUNs made of 1.92TB SSDs. We also have a HDD bay used for backup of the DB. The array is a RAID 6 out of 8 drives plus 1 separate hot spare.

The advantage is that the system works very fast and we do have a reliable backup in case of emergency.

The servers are Hyper-V'ed in cluster on 2 physical servers. So the servers have redundancy, the storage has backup on HDDs.

The system works for 3 years now and there was no problem, SSDs still show healthy, with endurance at 95%.

As for the array, there is no mandatory need to break it. You can just make a big array and assign space for each VM or make 2 arrays and have each one assigned to a specific server.

Overmind
  • 2,970
  • 2
  • 15
  • 24
  • 1
    *As for the array, there is no mandatory need to break it. You can just make a big array...* Just don't ever do that with spinning disks and RAID5/6. – Andrew Henle Mar 02 '20 at 14:33
  • No good if you make an improvised software RAID; it's fine on something like Dell Compellent, HP storage arrays or any viable large storage provider. – Overmind Mar 03 '20 at 13:25
  • If you throw enough small random write operations at some incompetently-created 23-spinning-disk RAID6 array to overwhelm the controller's cache and IO coalescing capacity, IO throughput will drop to ranges measured in kB/sec. Even the best controller can hide only so much bad design. – Andrew Henle Mar 03 '20 at 15:57
  • 1
    Yes, too much small fragments punish any HDD system. At very high fragmented writes SSDs are the way to go, given they are of enterprise grade. – Overmind Mar 04 '20 at 06:29
1

In Lenovo terms, the SSDs you want for a production database are at the very least their "Mainstream" enterprise option. There is a significant price difference for the same size drives, but to be honest, if you spread that out across the server lifetime it's not that bad. With good SSDs, chances are high - unlike with mechanical drives - you won't need to think about your physical storage again for however long the server operates, as long as you don't run out of space.

Mikael H
  • 4,868
  • 2
  • 8
  • 15
  • If I create one big RAID 10, and host 2 or 3 oracle servers (SAP PRD as VM guests) , and if any backup / maintenance is running for one VM, It will effect second vm performance too? – iBBi Mar 04 '20 at 06:22
  • 1
    Of course it will - you are sharing the available I/O resources across multiple machines. The relevant question is: Does such maintenance cause enough I/O to bog down an SSD RAID10 of the size you'll use. You can do some theoretical calculations based on what you know about how the database engine works (it probably won't be "best-case 4KB sequential I/O", but you really should do some proper benchmarking before committing to putting a solution into production. – Mikael H Mar 04 '20 at 07:55
  • example we take backup using Backup exec to tape via network, and the vm guest also gets backup by VEEAM B&R software, so when the backup is in progress for VM#1 over the 10 G network , it will effect performance of VM#2 so its better to create 2 arrays, so that one vm may not impact second ? – iBBi Mar 04 '20 at 08:55
  • 1
    That's why I said you'd need to do some benchmarks on your own system: A small RAID10 could feasibly be a bottleneck if used under several environments, but the wider the storage array becomes, the bottlenecks would start appearing in different parts of your stack. – Mikael H Mar 04 '20 at 13:54
  • Ok. After talking with the vendor, we received following ThinkSystem RAID 930-16i 8GB Flash PCIe 12Gb Adapter / QTY: 1 ThinkSystem 2.5" PM1645a 1.6TB Mainstream SAS 12Gb Hot Swap SSD / QTY: 8 [for RAID-10) I hope its better then QLC ssd. – iBBi Mar 10 '20 at 06:12
  • 1
    I run Mainstream type drives in a couple of our database servers and have nothing to complain about in a suitable context. – Mikael H Mar 10 '20 at 06:18