The rule of thumb I use for disk IO is:
75 IOPs per spindle for SATA.
150 IOPs per spindle for FC/SAS
1500 IOPs per spindle for SSD.
As well as IOPs per array also consider IOPs per terabyte. It's not uncommon to end up with a very bad IOP per TB ratio if doing SATA + RAID6. This might not sound too much, but you will often end up with someone spotting 'free space' on an array, and want to use it. It's common for people to buy gigs and ignore iops, when really the opposite is true in most enterprise systems.
Then add in cost of write penalty for RAID:
- 2 for RAID1, RAID1+0
- 4 for RAID5 (or 4)
- 6 for RAID6.
Write penalty can be partially mitigated nice big write caches and in the right circumstances. If you've lots of sequential write IO (like DB logs) you can reduce those write penalties on RAID 5 and 6 quite significantly. If you can write a full stripe (e.g. one block per spindle) you don't have to read to compute parity.
Assume a 8+2 RAID 6 set. In normal operation for a single write IO you need to:
- Read the 'updated' block.
- Read the first parity block
- Read the second parity block
- Recompute parity.
- write all 3. (6 IOs).
With a cached full stripe write - e.g 8 consecutive 'chunks' the size of the RAID stripe you can calculate parity on the whole lot, without needing a read. So you only need 10 writes - one to each data, and two parity.
This makes your write penalty 1.2.
You also need to bear in mind that write IO is easy to cache - you don't need to get it on disk immediately. It operates under a soft time constraint - as long as on average your incoming writes don't exceed spindle speed, it'll all be able to run at 'cache speed'.
Read IO on the other hand, suffers a hard time constraint - you cannot complete a read until the data has been fetched. Read caching and cache loading algorithms become important at that point - predictable read patterns (e.g. sequential, as you'd get from backup) can be predicted and prefetched, but random read patterns can't.
For databases, I'd generally suggest you assume that:
most of your 'database' IO is random read. (e.g. bad for random access). If you can afford the overhead, RAID1+0 is good - because mirrored disks gives two sources of reads.
most of your 'log' IO is sequential write. (e.g. good for caching, and contrary to what many DBAs will suggest, you probably want to RAID50 rather than RAID10).
The ratio of the two is difficult is hard to say. Depends what the DB does.
Because random read IO is a worst case for caching, it's where SSD really does come into it's own - a lot of manufacturers don't bother caching SSD because it's about the same speed anyway. So especially for things like temp databases and indexes, SSD gives a good return on investment.