4

So I've been suspecting performance bottlenecks on a Samsung EVO 850 RAID1 for sometime, but honestly been too lazy to look into it. Now I was starting a new home project, involving a Vmware ESXi host with internal storage.

I got a number of hw raid controllers laying around, and decided it was time to upgrade my older Adaptec 51645, which has served me very well, and never caused any issues... unfortunately it seems to be an impossible mission, to get monitoring up and running on newer Vmware systems, since this controllers uses the older Adaptec driver set.

Long story short - my mind settled on ServeRAID M5015, as I had one with SSD accellerator key installed, and its compatible with a spare Intel SAS expander I've got lying around as well.

I set up 2 RAID1 hw raids - 2 x 250GB EVO 850, and 2 x 1TB EVO 850, and immediately realized something wasn't quite right. Performance seemed abysmal, and by the mere looks of it, even worse than a 7 x HDD RAID 5 - especially for write operations.

Not being into the guessing game, I decided to take a bit more analytical approach, and have now tested a number of configurations, which seem to all show the same.

The EVO's running in pure JBOD, provides maximum performance, and the expected one, whereas any application of raid code to the mix, seems to degrade performance by varying numbers, but at least 50% of the expected performance, given the one measured in JBOD.

I was hoping someone could shed some light on this, and hopefully provide some facts as to wether this is isolated to Samsung SSD's, or this is related to the raid controllers (details below).

These are the controllers and settings I've tested, and the results I've got.

  • NOTE: I know these are consumer SSD's with no capacitor.

  • All tests are performed using CrystalDiskMark64 - I've not dug too deep into IOMeter or similar, as the results from CSM are fairly comparative, and "good enough" for initial baselining (IMHO anyway).

    • Settings: 5 Tests, 16GiB file size.
  • All tests done on a Windows Server 2012 R2 platform, newest available drivers, and newest available controller firmware.

  • Only 1TB EVO 850's tested.

  • Controller cache enabled for Write-Back, Direct IO policy (where applicable), drive caches not enabled.

    • EDIT: I should have mentioned that I know this is consumer drives with no capacitor, and I've re-run the tests with drive caches forced on, with NO IMPROVEMENT - regardless of Windows Cache Flushing policy settings.
  • Intel RST

    • JBOD SEQ Read/Write Q8T1: ~550 MB/S / ~550 MB/s

    • JBOD RND4K Read/Write Q32T16: ~450 MB/s / ~300 MB/s

    • RAID1 SEQ Read/Write Q8T1: ~1100 MB/S / ~265 MB/s

    • RAID1 RND4K Read/Write Q32T16: ~300 MB/s / ~24 MB/s

  • IBM ServeRAID M5015 (LSI) with SSD Accellerator key.

    • This one aint new, but it got plenty of power to handle the raid code even for many fast SSD's.

    • Does not support JBOD, so I use RAID0 Single drive for that test.

    • RAID0 Single Drive SEQ Read/Write Q8T1: ~524 MB/S / ~265 MB/s

    • RAID0 Single Drive RND4K Read/Write Q32T16: ~405 MB/s / ~370 MB/s

      • This is a totally off, unexpected and weird result.
    • RAID1 SEQ Read/Write Q8T1: ~520 MB/S / ~265 MB/s

    • RAID1 RND4K Read/Write Q32T16: ~200 MB/s / ~24 MB/s

  • HP SmartArray P411

    • Identical to ServeRAID M5015
  • Adaptec 51645

    • This controller is a 3GB/s Controller (its an old thing afterall)

    • JBOD SEQ Read/Write Q8T1: ~268 MB/S / ~268 MB/s

    • JBOD RND4K Read/Write Q32T16: ~268 MB/s / ~265 MB/s

    • RAID1 SEQ Read/Write Q8T1: ~545 MB/S / ~265MB/s

    • RAID1 RND4K Read/Write Q32T16: ~530 MB/s / ~260 MB/s

So bottom line is that the old Adaptec actually handled the raid scenario best, just not really an option, since its only doing 3 GB/s SATA II. The Intel one did best in the read scenearios, but it isn't an option on Vmware ESXi.

Currently i'm leaning towards simply using the SSDs as single ones, with one datastore on each, and using Veeam Backup & Recovery to replicate the VM's between datastores, because it just don't look like I'll be able to get any reasonable performance out of them, on a raid controller.

I did quite a bit of research on the subject, and it seems like I shouldn't get my hopes up for any of this.

Would anyone here by chance know - for a fact - that another/newer controller would resolve this?? Or is it simply the EVO's having trouble in raid setups? (I had plenty of hw raid controllers, but only EVO drives of varying capacity, so couldn't do the test myself).

Thanks in advance for any feedback here.

So - thought I want to post on update, on further tests.

I've built the Vmware ESX setup, and upon extracting some data from one of the SSD's, before making it ready for a new test setup, it seems apparent that there is something which just makes the SSD's not play nice with at least one of the controllers.

I made a virtual RDM for one of the SSD's, and passed it through to a VM.

I went with the IBM ServeRAID M5015. After reading around 40GB in one go, the disk simply goes unresponsive (I suspect the controller is not playing well with the drive firmware, but this is pure speculation). The drive does not go offline, it just doesn't respond, and only a reboot of the VM will solve that.

Using a non-raid controller there is no issues as all.

Funny times.

Now I'm looking for a reasonably priced, newer date, raid controller, which isn't neccessarily SAS - SATA will do, and which has the prereqs to be monitored while running in an ESX server...

Update

Never got it working as I wished. Ended up buying 2 older Intel SATA Enterprise SSD's for the primary workload, and just using the Samsung EVO's for the less performance sensitive workloads.

I made a script to monitor the raid inside ESXi using StorCLI, and passed through my old trusty Adaptec 52645 to a VM for dealing with the larger disksets containing mostly data at rest (since the LSI controller apparantly does not support power saving and disk spindown... siiigh...)

I all works now - just not as I originally intended.

Anyway - thanks for your inputs.

Sharza
  • 51
  • 4

1 Answers1

6

Samsung 850 EVO are consumer SSDs, lacking powerloss protected write back cache. So, the RAID controller will disable the SSD private cache, which is critical to extract good performance from consumer flash drive.

To restore performance you had to re-enable the disk cache, which can however impact data resilience against a sudden power loss.

You can see here for more information.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • 3
    On top, EVO have a VERY limited endurance rating. Have fun replacing them in 6-9 months, and no, this is not a warranty case. They are rated for a certain amounts of writes, and you WILL blow them. The 250s are rated 41GB (!) writes per day each. Enough for a low write system - not enough for virtualization. The 1tb is arted 82gb/day. – TomTom Mar 08 '20 at 16:49
  • 1
    I should have mentioned that I'm aware we're talking consumer drives with no capacitor, and that I've run the tests with drive caches forced on as well, with no improvement to write performance regardless of Windows Cache Flushing policy - I've edited the answer to reflect this. Endurance is also not part of the puzzle here. My original question stands, which is: Will another/newer controller resolve this?? Or is it simply the EVO's having trouble in raid setups? I'm suspecting some drive specific not playing nice with the raid code on these controllers. – Sharza Mar 09 '20 at 17:58