Simultaneous IO to multiple SSDs in external USB 3.0 enclosures

3

UPDATE: Just tried this scenario on a board with eight extra SATA ports and it works. Slower than I'd thought but still acceptable. As per the discussion with David Schwartz I believe he may be correct that there is something pathologically wrong or that Ali Chen may be correct that RHEL just can't handle that many host controllers going all out at the same time. I'm going to experiment a bit more since I've come this far and am basically being paid to be curious at this point. :)

BEGIN ORIGINAL POST

So, the setup is a touch lengthy. We have two RocketU 1144D 4 port USB 3.0 cards in our system with one on a PCIe 2.0 socket and one in a PCIe 3.0 socket to avoid bandwidth issues. Each of these USB cards has four Crucial MX300 1TB SSD in an externally powered Silver Stone Raven enclosures attached to it. The need, per the customer, is to be able to simultaneously write the same set of files to four of the eight disks while reading files from the other four disks to calculate MD5 checksums. Each disk will be as close to capacity as possible with files that are roughly 1GB in size either at the time of read or after all files are written.

Now, if we only access or write files to disks on one of the cards the speeds are not that bad. At the full TB we're averaging between 3 and 4 seconds per file (read/calculate or write). The problem is that when we try to do both at the same time the read and write speeds degrade pretty quickly going from a start of roughly 1.5 seconds per file to over sixty seconds per file.

The only other cards in the system are the video card in the PCIe 3 16x slot and an Intel X540-T2 adapter (not used currently) in another of the PCIe 3 8x slots.

We have a dual CPU X10DRL-i server MOBO with two 6 core Zenon processors on it and 64 GBs of RAM running RHEL 7.2 from another Crucial MX300 connected to a SATA port.

So, the question is, is it possible to do what is described above within a decent amount of time defined as: one thousand one gigabyte files per SSD read from four SSDs connected to card one written to four SSDs connected to card two the operations MUST be done in parallel (because customer) all in less than an hour?

From what I'm learning I'm starting to lean towards no but thought I'd ask and see if anybody with more knowledge than me has anything more definitive. Any help, advice, and especially an answer is very much appreciated.

EDIT per suggestion by David Schwartz:

Required Bandwidth per card 5Gbps per USB 3.0 port x4 ports = 20 Gbps

Available Bandwidth PCIe 2.0 x4 at 500 MBps per lane = 16 Gbps

Since one card is using the PCIe 3 lanes and the other the PCIe 2 lanes there shouldn't be a conflict for those resources as I understand it.

NOTE:

I know the card was over sold on bandwidth but the reads and writes should not go into multiple minutes per GB file.

EDIT 2:

After David Schwartz's suggestion I monitored core usage using the system monitor and htop. The system shows 100% or near 100% usage or four cores for the first dozen file IOs. The system will then kind of freeze for a few seconds and this is when file IO degradation occurs. Further, core utilization will rarely go to 100% after this and when it does it is very briefly.

EDIT 3: Most likely the final edit.

After a bit of research and experimentation I think we can say this is not going to work for the card at hand and I'd wager the StarTech card mentioned in the comments won't work either. I believe we can come to that conclusion based on several things. In short, one SSD works great on the card. Two work OK with a bit of slowdown; overhead I guess. However, 3 or more start doing bad things. I imagine this is because we're trying to push 20Gbps over 16Gbps worth of PCIe lanes and instead of getting 16Gbps theoretical maximum the controllers on both sides of the transmission may be tripping over each other and generally causing things to back up to the point of slowing data transfer to a crawl. This is just a theory but it was good enough to get the customer to drop the USB requirement and allow us to try SATA and other methods. SATA is working out much, MUCH better so I think we have a winner. Thanks to David Schwartz and Ali Chen for their help and suggestions.

EDIT 4: The actual final edit

So, I stumbled across the answer to my question in multiple parts yesterday while looking at SATA solutions. The actual problem was two-fold and only became apparent after the first of the problems was discovered.

So, the first problem was memory management. Having tested the bit of software reading the large files for writing it looked like the files were being read once and then written multiple times. This was not the case. So, we had multiple read requests for multiple 1GB files happening constantly. Why this worked in tests but not in practice I'm not sure but we didn't have the time to do a post mortem so that gets left to history.

The second problem is that we are not hardware guys and so we did not know one very important detail while working on a Linux system. Since NTFS is not native to Linux (this we knew) apparently it will run nearly an order of magnitude slower (this we did not know). Had this been a Windows box we'd have had no issue.

Combine these two factors together and you get the erratic behavior we experienced. Once we did a complete reformat of all disks to EXT4 we stopped seeing any of the unpredictable read/write times and everything worked as expected. We could do the simultaneous writes and read/md5 calculations within allowable parameters.

soulsabr

Posted 2017-03-21T20:12:50.690

Reputation: 156

Did you do the math for the required bandwidth per USB port, per USB controller, per PCIe lane, and so on? If so, you should post those numbers to save every person trying to help you from having to do that math. Also, are any of your CPU cores maxed during the copy? – David Schwartz – 2017-03-21T20:36:16.777

@David Schwartz Yes, I have done the math and I'll get that posted ASAP. At least one of the cores looks saturated but I'll do another run and verify how many it actually is. – soulsabr – 2017-03-21T20:59:45.813

The sudden decrease in performance by orders of magnitude suggests that something pathological is going on somewhere. It might be worth trying a different USB controller since that's the most likely suspect, IMO. – David Schwartz – 2017-03-21T21:21:32.497

@David Schwartz I'm getting worried that you're correct. Thanks to monitoring using the system monitor and htop I noticed a kind of system wide freeze which is when severe degradation starts. After this it never really recovers. Sadly, the card we have is the only one we could find with four ports and is RHEL 7.2 compatible. Thanks for the help. – soulsabr – 2017-03-21T21:44:50.677

2The StarTech PEXUSB3S44V has four NEC/Renesas uPD720202 controllers bridged to an x4 PCIe 2.0 bus. I would be quite surprised if it didn't work with RHEL 7.2, since it works on many Linuxes. It's just a normal PCIe-to-PCIe bridge and XHCI controller and no special support is needed. – David Schwartz – 2017-03-21T21:52:52.603

@DavidSchwartz SURPRISE! I actually have two of those sitting in their boxes right now as they were our first choice. I can't remember the specifics but there is something wrong in an older kernel driver that was fixed somewhere after the 3.10.0.514 RHEL is using. I remember reading that centos is OK as well as Ubuntu et al. So, we know that they'll eventually work but we're not sure when that eventually will be. – soulsabr – 2017-03-21T22:27:29.443

1I am wondering about the requirement to write/read data "simultaneously". Each USB 3.0 channel is driven by individual AsMedia host controller, each having an independent DMA engine. I am not sure how to make things work "simultaneously", unless it simply means 8 individual SSD devices at one station, to test their quality or something for mass production. 100% core utilization does not look right as well. Overall, the issue looks like the RHEL software can't manage 8 individual USB 3.0 host controllers that are active at the same time. – Ale..chenski – 2017-03-22T00:56:01.733

1Also, the PLX bridge is PCIe 2.0 device. Does your single 4xUSB card perform equally well in 3.0 slot and 2.0 slot? Did you try both cards in PCIe 3.0 slots? – Ale..chenski – 2017-03-22T01:01:16.440

@AliChen That is a good suggestion. I'm doing some software right now to take a break from the hardware reqs but I think I'll give this a go after I finish up. I'll post back the results. – soulsabr – 2017-03-22T15:39:11.647

No answers