I have a data-analysis application which is CPU-bound, and produces ~35 MB/s of data per thread of execution.
I'm trying to work out how many threads I can run concurrently, and be able to write to disk without causing each to endlessly wait for I/O.
I found this answer on how to calculate write performance, which I have performed on my 2 local disks, an SSD and 7200 RPM HD.
The results are:
SSD:
$ time sh -c "dd if=/dev/zero of=testfile bs=1000k count=1k && sync"
1024+0 records in
1024+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.668421 s, 1.6 GB/s
real 0m3.549s
user 0m0.000s
sys 0m0.456s
With the sync, that's 281 MB/s
HD:
$ time sh -c "dd if=/dev/zero of=testfile bs=1000k count=1k && sync"
1024+0 records in
1024+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 8.79985 s, 119 MB/s
real 0m10.122s
user 0m0.004s
sys 0m0.549s
With the sync, that's 98 MB/s.
Am I able to draw the following conclusions?
Writing results to an SSD:
- Disk capable of writing at 281 MB/s
- Each thread produces 35 MB/s
Therefore, I could run 8 threads concurrently. (281 / 35 = 8)
Writing results to a HD:
- Disk capable of writing at 98 MB/s
- Each thread produces 35 MB/s
Therefore, I could run 2 threads concurrently. (281 / 35 = 2.8)
Are there other considerations I need to take into account, such as interconnect speeds etc? (My drives are connected using SATA 3)
If the above analysis is correct, am I able to add additional drives to allow running more threads? If so, would they share an interconnect, and therefore introduce an upper limit due to that, and if so, how would I calculate that?)