How can I achieve maximum sustained sequential disk write performance?

-1

2

I need data write rates of ~1 GB/sec sustained for ~1 hour. Data is coming in over this PCIe x4 frame grabber. I need to stream its full bandwidth to disk.

I don't have experience with RAID, but as best I can tell, RAID 0 with as many high RPM disks as possible is the answer. I also gather that discrete RAID controllers are much faster and more reliable than any built-in to motherboards.

For the sake of a specific starting point for concrete answers, my initial guess is that the following hardware will be a good system for this task:

  • RAID controller: LSI MegaRAID 9280-16i4e
  • HDD's: 11x Western Digital Black Caviar 2 TB SATA III 7200 RPM 64 MB
  • Cables: 3ware CBL-SFF8087OCF-10M SFF-8087 Serial ATA Breakout Cable
  • Motherboard: Gigabyte GA-Z77X-UD3H LGA 1155 Intel Z77
  • Power supply: Silverstone Strider Gold Evolution SST-ST1200-G 1200W v2.3 80 PLUS GOLD
  • Case: Rosewill RSV-L4411 4U case 12 hot swap bays

My question is: how do I achieve maximum sustained sequential disk write performance?

A good answer will address the following:

  • What features/specs do I need to look for in the RAID controller and HDD's for fastest sequential writes?
  • Will write speed be independent of the CPU? (IE, how do i ensure using DMA?) Is there a way for the data path to even bypass RAM? Would quad vs dual channel RAM matter?
  • Is there any bottleneck to look out for on the motherboard, ie the north/south bridges? If so, how would I detect/avoid such a problem?
  • In sustained sequential writing, are any caches (on the controller, HDD's, CPU, etc) relevant?
  • How do I ensure the PSU is adequate for all these drives? I understand I might have to worry something about amperage draws on the rails? Will inadequacies here show up as performance problems/random crashes or will it just clearly work/fail?
  • Same question as above, regarding cooling.
  • Would there be an advantage to using an external drive enclosure? Does connecting to them impose a bottleneck?
  • What BIOS settings are important for this application? AHCI, etc?
  • What filesystem is best? The camera/frame-grabber drivers are all Windows, so I'm stuck in win7. I assume 64 vs 32 bit will improve bandwidth?
  • What tuning should i expect to have to do?

A previous version of this question was removed for being "too broad":

"There are either too many possible answers, or good answers would be too long for this format. Please add details to narrow the answer set or to isolate an issue that can be answered in a few paragraphs."

But my question is very specific, of general interest, and I have provided details that allow an efficient answer in a single paragraph, not "a whole book." All my detailed questions merely ensure that answers are comprehensive regarding the potential bottlenecks that anyone should be concerned with for this single problem: fast sustained sequential writes. It wouldn't be useful to anyone to break up the question into 32 separate questions, as user 50-3 suggested. Here is an example response that shows the form of what I'm expecting (I have no idea if the actual information is correct, it is my best guess):

  • RAID 0 with high RPM disks is indeed the way to achieve fastest sustained sequential writes (assuming you are using your frame grabber's "stream" mode). SSD's aren't good for this because they dramatically slow down their write time with usage due to processing required for "leveling" (preventing any one location from being used more than others).
  • To sustain 1GB/sec indefinitely, you need >3 7200RPM 6Gb/sec SATA drives (6Gb/sec * 1/8 GB/Gb = .75 GB/sec/drive with no headroom). More drives will improve your bandwidth headroom linearly, but saturate after the data width of your bus (32 or 64).
  • SATA is the most cost-effective HDD technology, SAS doesn't have appreciable advantages for fast sequential writes. SAS is better for seek times to random locations and reliability. The faster RPM in SAS would increase sequential write speed, but is counteracted by lower density/capacity.
  • Any decent drivers for frame grabbers/RAID cards use DMA (the ones you mention do), so CPU won't matter. The data path will always include system memory. Writing to disk will be much slower than your RAM, so you don't need anything exotic (any DDR3 is fine). The amount of RAM (and size of caches on controller, HDD's, CPU) does not matter, because buffers quickly fill during sustained writes.
  • The north/south bridge on any PCIe 2.0 motherboard won't be bottlenecks. All you need is a discrete RAID controller >= PCIe 2.0 that has enough SATA connections for the drives you have. External connections to an enclosure are a bottleneck only if using expanders causing drives to share bandwidth. You want a card with more PCIe lanes than the 4 on the frame grabber so the PCIe bus won't be a bottleneck. The 9280 will be fine, but is a lot of overkill for your purpose; a 9240 8i would be less than half the cost and adequate. LSI controllers are among the most expensive but tend to be faster/more reliable/less hassle during error recovery than cheaper brands Highpoint/Areca.
  • You need a PSU with enough wattage for all your drives and the controller (the 9280 uses 15W and each WD uses 10W). Each drive has a peak draw of ~1A current and you need to limit the number on each circuit ("rail") of the PSU. The 1200G has one rail with 100A, so you won't have a problem. Overdraws would show up as random hard crashes (possibly damaging the drives and other components), same for overheats.
  • The cooling built in to a case made with 12 hot swap bays should be adequate for near-constant loads of non-sequential reads, which produce more heat than your sequential writes. To be sure you don't need additional cooling, monitor temp (google HDDTemp) after many minutes of sustained writes.
  • AHCI is the only BIOS setting relevant to fast sequential writes (turn on SMART too). Set both of these before installing Windows.
  • Windows' NTFS file system will be fine (there's no alternative anyway).
  • You will have better sequential write performance with win64 vs win32 because the DMA bandwidth to the raid controller will be twice as big.
  • You shouldn't have to do any tuning; the default block size, etc set up by your raid controller should be adequate. Bigger blocks would be faster, but more susceptible to corruption and unnecessary.

If you still consider this question "too broad," please specify exactly why and suggest how it could be narrowed, while still providing a thorough answer for people interested in achieving maximum sustained sequential write performance. This question belongs on Superuser more than Serverfault because it is not specific to corporate IT.

user1441998

Posted 2013-11-25T21:49:52.580

Reputation: 31

The specs for that card on PCIe x4 mentions data transfer to the host PC of up to 850 MBytes/s and says "This allows a data transfer of the original image and of additional information and results." Double check how much data you will be putting to disk. – Brian – 2013-11-25T23:41:24.943

2

It doesn't really matter if you use 3 or 6 Gbit/s SATA. Even a 10000 RPM WD velociraptor won't normally write more than around 110 Mbyte/s. What is the format of the frame-grabber data? If it's raw, why not compress it before it hits the disk? That should lessen the requirements.

– Roland Smith – 2013-11-26T00:03:07.710

2The main problem I have with this question is that you're asking too many questions at once, all in a big clump. Either you do know what you are doing (as such, the people able to actually answer this are in low quantity) or you know nothing and are just shouting terms, IMHO. This will lead to answers that, for your particular problem, will be inherently too long, and not too broad. Pretty please bold what you think are the most relevant parts of the question. – Doktoro Reichard – 2013-11-26T00:04:53.450

@Brian - the frame grabber page says "DMA transfer rate of up to 900 MB/s" -- where do you see 850? in any case, i listed it as "~1GB/sec" which is rounded up only slightly to give moderate headroom. also, the question is about maximizing sequential write speed in general. – user1441998 – 2013-11-26T00:16:10.717

@Roland Smith - this question is about using RAID 0 to write much faster than any single drive can. there must be some difference between 3/6 Gb/s SATA -- how could it be irrelevant here? i am using a commercial camera with associated software that is made to work with this frame grabber (http://www.pco.de/categories/scmos-cameras/pcoedge-42/). they control the format/processing, though i believe it is RAW. at these data rates, i don't believe compression could be fast enough to sustain for long duration streaming, and in general, there may be no redundancies in the data.

– user1441998 – 2013-11-26T00:23:13.233

Max DMA rate isn't the same as sustainable transfer rate - http://www.silicon-software.info/en/products/item/download/123_0e12fc1acc18fb0b9a887763e82031b7.html lists the latter as 850Mbytes/s.

– Brian – 2013-11-26T00:28:33.997

@Doktoro Reichard - Can you be specific about how you think the questions should be split apart? They are all relevant to the main problem: maximizing sequential write speed. I gave a specific example of the form of a response that is completely comprehensive, satisfies my particular problem, and is of general interest, without being too long. I bolded the main question, which is also the title. The supporting questions I ask all concern potential bottlenecks that would reduce sustained sequential write performance, and are therefore required of a comprehensive answer. – user1441998 – 2013-11-26T00:45:23.277

@Doktoro Reichard - My question makes clear that I know an intermediate amount: I suspect (but am unsure) that RAID 0 is probably the answer, that SATA/NTFS are probably adequate, that DMA and AHCI are probably important, that caches probably only matter for nonsequential ops, and that my hardware guesses may be incompatible, suboptimal, or need significant tuning. I am specific about what I don't know: does win32 vs 64 matter for this, can RAM/CPU/northbridge be cut out of the datapath, how to detect inadequate power/cooling, etc. What specifically strikes you as "shouting terms?" – user1441998 – 2013-11-26T00:49:34.670

@user1441998 Regarding the SATA speed; it is not the weak link in the chain. See Hennes's answer for more detail. W.r.t compression, RAW video is extremely redundant. Compared to raw frames, properly compressed video can easily be 1/100 of the size. – Roland Smith – 2013-11-26T20:26:59.900

@Roland Smith - then what is the purpose of 6 vs 3 Gb/s? does it only make a difference for operations that have preternaturally good cache locality or something? regarding compression, it completely depends on the content. in scientific applications (like mine), the pixels may be far more independent than in natural scenes. – user1441998 – 2013-11-26T21:20:34.243

@user1441998 SATA III is mainly for SSDs which can saturate a SATA II bus. HDDs on the other hand can't. From the SDK docs it seems that the driver indeed dumps raw data in memory. Even if you use lossless compression (e.g. PNG) on individual 24-bit color images you should be able to reduce the picture size by a factor of around 2.5 to 10, depending on the image. That certainly seems worthwhile. – Roland Smith – 2013-11-26T21:30:08.177

FYI, the question was removed because you posted it to multiple sites, not because it was put on hold. Cross posting is not allowed, so please delete the version on Server Fault (especially since you admit it doesn't belong there). Thanks. – slhck – 2013-11-26T21:37:14.700

@Roland Smith - perhaps SATA III on hdd's comes into play when using port multipliers? (see comment below)

– user1441998 – 2013-11-26T22:55:20.230

@Roland Smith - as i said in this comment above, i am using a commercial camera and its software -- the format is up to them. but you have not addressed either point that A) every pixel may be independent and B) any compression may be too slow to keep up with these data rates. please provide documentation backing up your assertion that RAW contains redundancies when all pixels are independent and that PNG can be encoded at these rates.

– user1441998 – 2013-11-26T22:57:22.520

Answers

4

You made a very long list which I am not going to answer one by one. However I want to make these things very clear:

1) PCI can not sustain those speeds. PCI express can, it is a totally different technology with point to point links (called lanes) instead of a shared bus. The card you linked to is " PCIe x4". The extra e is very much relevant.

2) Stripes (RAID 0, RAID10 etc etc) is quite possible. Either with a dozen high performance disks. Or you could use normal disks. An office corner shop, bog standard 7200 RPM SATA drive will do about 100MB/sec. So you would need at least a dozen of these (since things never scale quite perfect).

3) Both HW RAID, software RAID and Fake RAID (software RAID with BIOS supports, e.g. Intel IRSST) will work.

Software RAID is not recommended if you do not to do a lot of calculations (e.g. RAID6) and need high performance or have a slow CPU.

Hardware RAID will vary. A good HW RAID card is great. A bad one might perform quite poorly compared to a good SW RAID solution. Good HW RAID often needs battery backed cache or flash to enable the fast modes.

4) SATA II or III (3.0 or 6.0 GB/sec) or SAS 3GBIT/sec, SAS 6GBIT SEC, ... does not matter. And individual spinning disk will not saturate any of these links. Current consumer SATA drives max out around 100MB/sec. High end enterprise SAS drives can get up to 200MB/sec. Both speeds are lower than 3.0GB/sec.

5) RAID0 is not very safe. If one disk fails, you loose all. This might be acceptable if you just need to test things and save the data. And them immediately save it somewhere safe of process it. However a the more disks you use the more disks can fail.

RAID is usually about redundancy. RAID0 is not, it is solely about performance.

6) Lastly for completeness sake: SSD is not inherently bad for this. For this much data they will be expensive and possibly not needed, but an SSD does not need to slow down. Just completely wipe the SSD (e.g. delete all partitions, or secure erase it) before you add it to recording array. Once it is full it may slow down. But properly prep it and run it for one session and it should be fine.

7)

AHCI is the only BIOS setting relevant to fast sequential writes (turn on SMART too).

You can not turn SMART on or off. It is always on on the drive. The option in the BIOS just means 'read the drives SMART data when you POST and if there is anything wrong then warn the user. Usually with a single line like 'SMART: DISK FAILURE IMMINENT. Press F1 to continue!". It has no performance influence.

Set both of these before installing Windows.

For consistent performance: Install the OS on its own drive. Keep separate volumes for OS and for data.

8)

T sustain 1GB/sec indefinitely, you need >3 7200RPM 6Gb/sec SATA drives (6Gb/sec * 1/8 GB/Gb = .75 GB/sec/drive with no headroom).

No.

A 6GBit/sec data link SATA drive will be able to transfer roughly 300MiB/sec between disk and controller/RAID card. (6.0 divided by 8 for bit-to-bytes, but there is also some overhead and a /10 is more realistic).

Secondly the drive will be able to receive the data quite quickly, but writing it to a disk will be slower. A realistic value for a modern 7200 RPM SATA drive is 100MiB/sec sustained write.

That means you need at least 10 such drives. And only if everything scales perfectly.

More drives will improve your bandwidth headroom linearly, but saturate after the data width of your bus (32 or 64).

True for PCI. But despite writing PCI the OP meant PCI-e, which is a lot faster. 4 lanes PCI-e v2 is up to 10Gbit/sec. That should be enough (though there is not much headroom).

Hennes

Posted 2013-11-25T21:49:52.580

Reputation: 60 739

edited frame grabber mention to specify PCIe x4 in OP

Software RAID is not recommended if you do not to do a lot of calculations (e.g. RAID6)

did you mean "NOT recommended if you DO a lot of calculations?" if not, please elaborate

A good HW RAID card is great. A bad one might perform quite poorly compared to a good SW RAID solution.

what, exactly, distinguishes "good" from "bad" HW RAID cards? how distinguish before purchasing? – user1441998 – 2013-11-26T21:40:53.453

Good HW RAID often needs battery backed cache or flash to enable the fast modes.

i cannot find doc one way or the other for the 9280 mentioned. how do i find out? – user1441998 – 2013-11-26T21:41:58.443

SATA II or III (3.0 or 6.0 GB/sec) ... does not matter. Current consumer SATA drives max out around 100MB/sec.*

is the advantage of 6 Gb/sec only when using expanders/multipliers to daisy chain more than one drive on a SATA port? if my 750MB/sec calculation is correct, then will there be no performance degradation from using 6 SATA III drives on one port?

also, you keep saying "GB/sec" (gigabyte), but isn't it "Gb/sec" (gigabit), 8 times smaller? – user1441998 – 2013-11-26T21:51:34.423

Lots of questions. Lety try them one by one. 1) SATA II/III speed does not matter for sustained throughput. It does help with port multipliers and it does help if the data is in the drives cache. The cache however is not big to contain all (or most) of your data. Typical cache sizes on current drives are 32MiB or 64MiB. Nowhere near the amount of data you want to capture. 2) RAID6/calculation corrected the sentence. 3) Call the manufacturer. The card might very well have limits (sometimes artificial ones). – Hennes – 2013-11-27T00:22:59.663