1
QEMU emulator version 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.11.1)
Linux 3.10.0-693.el7.x86_64

There is two luns attached via virtio-scsi (vcpu=1,virtio-scsi controller queue set to 1)

Firt:

dd /dev/sde device only , the iops was 6k.

Second:

fio(num_job=3) + /dev/sdc and still dd sde

worked just fine, iops of sde still 6k.

Final:

only increase num_job of fio to 4,

then iops of sde drop to 34 !!!!????

I had no idea why,if anyone could give me some suggestions?

Firt Test:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00 6638.14     0.00    51.86    16.00     0.92    0.14    0.00    0.14   0.14  92.16

Second Test:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00   99.01   76.24     0.77     0.60    15.95    96.04  547.80  969.59    0.01   5.65  99.01
sde               0.00     0.00    0.00 6966.34     0.00    54.42    16.00     0.83    0.12    0.00    0.12   0.12  82.87

Final Test:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  100.00  107.00     0.78     0.84    16.00   128.97  621.03 1280.22    4.97   4.83 100.00
sde               0.00     0.00    0.00   31.00     0.00     0.24    16.00     0.99   32.03    0.00   32.03  32.03  99.30

dd:

while true;do dd if=/dev/zero of=./test bs=8K count=102400 oflag=direct;done

fio.ini

[global]
    bs=8k
    ioengine=libaio
    #rw=randrw
    time_based
    runtime=7200
    direct=1
    group_reporting
    iodepth=32
    size=1G
    
    [file1]
    filename=/dev/sdc
    numjobs=3
    rw=randrw

1 Answers1

1
  • Comparing dd to fio is kind of tricky... I would recommend not doing so unless you understand just how fio and dd are sending their I/O and all the caveats involved. For example, did you know that various things in the storage stack (filesystems, devices themselves) may optimise for when zeros are sent (e.g. through detection or compression) thus you get a different speed to when non-zero data is used?
  • You are using a small region. Sending too little data is another benchmarking pitfall and means your results can be massively distorted due to overwrites destroying older in-flight commands and the earlier commands may end up being optimised away. Less likely in this case (due to O_DIRECT being used) but it's another potential distraction. More result distortion.
  • Can't tell if /dev/sdc and /dev/sde are actually represented by files within the same filesystem as each other/partitions of the same disk on the host. If so then they are both competing for I/Os that have to come from the same underlying device...

There is some maximum number of I/Os that can be kept in flight due to characteristics of your disk and availability of CPU (this applies to the host and the guest). Trying to do more than that means you are just generating contention over the I/O queue and/or CPU(s). fio is fairly efficient at sending I/O and your job file shows it is allowed to "queue up" more I/O in one go than dd can (did I mention it's tricky to compare dd to fio?). You didn't show the full fio output so we're missing some useful extra data... How much CPU was left over (guest and host)? How did iostat on the host look? If there's contention then I'd expect even one fio to beat dd (with the settings you used) let alone three...

Note: if for some reason your disk queue is bound to a particular CPU then you may be in a situation where you need CPU on particular processor to be available and if both your disks are dependent on the same one...

Anon
  • 1,210
  • 10
  • 23