Ubuntu - inconsistent performance with software raid-1


I'm installing a new system using soft-raid 1 with two physical disks. While running a few tests (after waiting for the initial sync to complete) I found the harddrive speed was highly inconsistent.

My test was very simple: copying about a gig of jpegs using cp -r and then deleting them and checking how long it took.

for i in {1..5} ; do
  echo ".. start run $i"
  time cp -r public public2

  echo "... deleting duplicate"
  time rm -rf public2

  sleep 1

Occasionally throughout the test the process seems to 'hang' briefly, doesn't respond to ^C and (as seen in the results below) causes a delay of tens of seconds, top reports "Waiting for I/O" at anywhere from 20% to 70% at this time and the console becomes sluggish, even unresponsive.

These were the results with about 1 gig of jpg images:

        copy       delete
run 1   1.336s     35.929s
run 2   2.300s     50.737s
run 3   2.358s     26.562s
run 4   0.971s     23.717s
run 5   17.485s    27.074s

The speeds clearly vary wildly. I ran this set of 5 tests a bunch of times and they occur every time, although not necessarily in the same runs. In the displayed results delays occurred most often during delete but that also varies.

Another attempt, some time later with a smaller dataset (~600MB):

        copy       delete
run 1   11.614s    36.403s
run 2   0.630s     0.208s
run 3   0.652s     14.891s
run 4   0.676s     0.192s
run 5   0.640s     0.213s

With a smaller set the delays occur much less frequently, often passing all 5 runs without any delay.

One more attempt with a larger dataset, about 1.5GB:

        copy       delete
run 1   26.687s    22.336s
run 2   38.336s    22.466s
run 3   44.711s    20.473s
run 4   41.269s    22.721s
run 5   41.592s    26.499s

Here the delay occurs almost every time.

My thoughts were going to a hardware fault but I booted to a rescue prompt, manually mounted one of the drives and did the same test. This time the results were completely consistent and fast.

Any thoughts would be appreciated because I'm at a loss.


Posted 2012-05-07T03:57:26.370

Reputation: 171



You have 2 GBytes of RAM? 600MB in .6s is 1GB/s which is a very fast disk. The system uses write-behind to return control to the user process more quickly, but when the system is out of buffers, the user process has to wait for the data to be written.

Also, deletes are often an intensive operation for kernels that does not give user processes a lot of opportunity to run


Posted 2012-05-07T03:57:26.370

Reputation: 932

1Also, the time will depend massively on how "far apart" the two files are on the disk. When you're copying from and to the same drive(s), the heads have to move back and forth from one file to the other. It's sheer luck how far apart on the drive they are. (Also, you can partially cancel out the buffer flush factor by doing a sync before the test and doing a sync afterwards and including the time the sync after the test takes in the test's run time.) – David Schwartz – 2012-05-07T05:17:46.903

I tried dropping the disk cache between each run (sync ; echo 3 > /proc/sys/vm/drop_caches) and that made the delete operations almost completely consistent (20s@1.6G). The copy operation still fluctuates (~30-60s) but that could be geometry. Thanks for the explanation both! – PeterD – 2012-05-07T13:36:58.537