1

Currently am trying to upgrade an iSCSI shared storage server for a SQL Server cluster (server is running Windows Server 2012, I plan on upgrading that at a later time). After looking at where to get big wins in hardware I was excited to see the insane access times and IOPS of the Optane 900p SSD's. They market them for gamers, but they're definitely more suited for high IO workloads like databases. I know you can RAID 1 these via Intel's VROC, which requires a specific chipset, which this server doesn't have. So, I figured I can just do a Windows RAID 1. Here is where I run into issues:

  • Both drives are recognized in Disk Management.
  • Windows Server Storage Manager sees both drives, but only one drive can be added to a storage pool.
  • Using the older method of a dynamic disk mirroring in Disk Management works, as it allows me to setup the mirror but it will instantly fail with the error 'failed redundancy.' In this case in event viewer I see a message 'Extent Disk2-01 on disk {id here} that is part of the fault-tolerant volume D: is no longer accessible'

I have tested each drive separately for faults, including software that tests each sector as well as writing to the entire drive and testing for corruption. Nothing says either drive is bad. Worst case I'm thinking I can simply do some block level file mirroring software to mirror the VHD files across the drives, but obviously you won't get the up time benefit of a mirrored RAID. Anyone have any ideas of why this is happening and/or potential fixes?

  • What's your server/motherboard and motherboard firmware/BIOS version? – Chopper3 Dec 11 '18 at 15:28
  • It's an ASRock EPC602D8A motherboard which has the C602 chipset. Though to be honest I haven't tried updating the bios, I figured if the drives were recognized in Windows, and the RAID being OS level software, that it shouldn't matter at that point. However, I'll try upgrading the bios and report back. – Jarrod Christman Dec 11 '18 at 15:39
  • Updated BIOS, no change in behavior. Using Intel's SSD Toolbox it says the firmware of the drives are uptodate. I am currently running their fairly long 'Full Diagnostic Scan' as one last check to make sure the drives themselves aren't bad, but my guess is that it's just some odd software issue. – Jarrod Christman Dec 11 '18 at 16:03
  • Thanks for trying - the reason I asked about firmware versions is that the C602 is right on the edge of what supports NVMe and you having two in that box may be enough to put it over. Have you got any other system you could try the drives in, ideally one at a time to start with, then with both installed? Just to see if there's a problem with the cards/motherboard etc. – Chopper3 Dec 11 '18 at 16:06
  • Yeah, the hardware is getting a bit old at this point. Oddly, each drive, in the system works fine individually. I can even run disk benchmarks on both drives in the C602 chipset system fine at exactly the same time. It is only the RAIDing that seems to be stubborn. – Jarrod Christman Dec 11 '18 at 16:09
  • https://datacentersupport.lenovo.com/us/en/products/servers/thinksystem/st550/7x09/solutions/ht504421 Is interesting, apparently it's just Windows Server 2012 R2 and NVMe based RAID's in general... Their site says no workaround, but obviously I am curious if I can find one. – Jarrod Christman Dec 11 '18 at 18:15
  • Ah good spot Jarrod - I hadn't seen that one before - 2016/2019 sound like it may help at least. – Chopper3 Dec 11 '18 at 18:44
  • 1
    Yep, will do that as an upgrade and if it fixes the issue will report back here for others for future reference.Appreciate your help as well. – Jarrod Christman Dec 11 '18 at 19:00

1 Answers1

0

TLDR: Upgrade to at least Windows Server 2016

Complete answer: As a follow up, I found that others were having the same issues: https://datacentersupport.lenovo.com/us/en/products/servers/thinksystem/st550/7x09/solutions/ht504421

Essentially, the symptoms are:

  • Unable to add both NVME drives to a storage pool.
  • Able to create a more classic dynamic disk Windows RAID, but would instantly fail with the error 'Failed redundancy.'
  • Both drives checked out fine, via generic drive tools and Intel's diagnostics tool.
  • Both drives could be used individually at the exact same time, just not in any form of redundancy setup.

The solution? Upgrade to 2016. Looks like Windows Server 2012 r2 has a known bug that Microsoft, but is unlikely to be fixed. This bug prevents NVME drives from being able to act in a redundant configuration.