2

I understand queue depth which is the number of outstanding I/O requests that the storage controller can handle (https://www.tomshardware.com/reviews/ssd-gaming-performance,2991-3.html) i.e., this is the limitation on a storage controller which handles the I/O requests and sends the commands to disk (r/w) and it (not strictly?) drops the requests if there are more than which it can handle (which will be resubmitted by the clients presumably).

And the reason for having high outstading I/O requests could be multiple client connections requesting I/O or multiple processes even from a single host requesting I/O (which I thought, but it seems OS uses I/O scheduler merges the I/O requests - which are originated from buffer when doing periodic or on-demand sync and send only a fixed number of outstading requests, so that it won't overload the storage devices?)

Now, coming to the definition of iodepth in fio man page:

Number of I/O units to keep in flight against the file. Note that increasing iodepth beyond 1 will not affect synchronous ioengines (except for small degrees when verify_async is in use).

This aligns with my understanding of queue depth. If the IO is synchronous (blocking IO), we can have only one queue.

Even async engines may impose OS restrictions causing the desired depth not to be achieved. This may happen on Linux when using libaio and not setting `direct=1', since buffered I/O is not async on that OS.

Confused with this whole statement.

Keep an eye on the I/O depth distribution in the fio output to verify that the achieved depth is as expected. Default: 1.

I have run multiple tests for each iodepth and device type, with 22 parallel jobs as the CPU count is 24 and with rwtype: sequential read and sequential write. Iodepths are 1,16,256,1024,32768 ( I know 32 or 64 should be the maximum limit, I justed wanted to try anyway).

And the results are almost same for all depths and for all disks (RAID 6 SSD,NVME and NFS): except for sequential read on NVME disk with 32768 depth.

IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%

For NVME with 32768 depth,

complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%

I used libaio engine in fio (because I am not sure what IO engine I need to give for asynchronous I/O testing and libaio seemingly the right one. This is a different question altogether)

So, what's going on? Why is Submit and complete shows 1-4 (except for one run of NVME where it's >64)

[global]
lockfile=none
kb_base=1024
fallocate=posix
blocksize=64k
openfiles=100
ioengine=libaio
buffered=1
invalidate=1
loops=5
randrepeat=1
size=512M
numjobs=22

[sr-iodepth-1]
description="Sequential Write,Parallel jobs-22,IO depth-1,libaio"
readwrite=write
size=5G
iodepth=1

[sr-iodepth-16]
description="Sequential Write,Parallel jobs-22,IO depth-16,libaio"
readwrite=write
size=5G
iodepth=16

[sr-iodepth-256]
description="Sequential Write,Parallel jobs-22,IO depth-256,libaio"
readwrite=write
size=5G
iodepth=256

[sr-iodepth-1024]
description="Sequential Write,Parallel jobs-22,IO depth-1024,libaio"
readwrite=write
size=5G
iodepth=1024

[sr-iodepth-32768]
description="Sequential Write,Parallel jobs-22,IO depth-32768,libaio"
readwrite=write
size=5G
iodepth=32768


[sw-iodepth-1]
description="Sequential Read,Parallel jobs-22,IO depth-1,libaio"
readwrite=read
size=512M
iodepth=1

[sw-iodepth-16]
description="Sequential Read,Parallel jobs-22,IO depth-16,libaio"
readwrite=read
size=512M
iodepth=16

[sw-iodepth-256]
description="Sequential Read,Parallel jobs-22,IO depth-256,libaio"
readwrite=read
size=512M
iodepth=256

[sw-iodepth-1024]
description="Sequential Read,Parallel jobs-22,IO depth-1024,libaio"
readwrite=read
size=512M
iodepth=1024

[sw-iodepth-32768]
description="Sequential Read,Parallel jobs-22,IO depth-32768,libaio"
readwrite=read
size=512M
iodepth=32768
GP92
  • 599
  • 2
  • 6
  • 25
  • OP posted the same question to [Unix.SE](https://unix.stackexchange.com/q/459045/1131). OP also sent this question to the fio mailinglist where a [good answer was posted](https://www.spinics.net/lists/fio/msg07191.html). – maxschlepzig Aug 15 '21 at 10:59

2 Answers2

6

(Please don't ask multiple questions in one post - it makes answering really difficult...)

queue depth which is the number of outstanding I/O requests [...] which handles the I/O requests and sends the commands to disk (r/w) and it (not strictly?) drops the requests

Excessive requests generally aren't dropped - there's just nowhere to queue them in the device so something else (e.g. the OS) has to keep hold of them and submit them when space is available. They aren't lost, they're just not accepted.

And the reason for having high outstading [sic] I/O requests

There are many different reasons - you listed one of them. For example the device could just be slow (think an old style SD card) and not able to keep up even with one "client".

only a fixed number of outstading [sic] requests, so that it won't overload the storage devices?)

That's the aim but there's nothing saying the device will be able to keep up (and sometimes there are reasons/configurations where merging doesn't happen).

Even async engines may impose OS restrictions causing the desired depth not to be achieved. This may happen on Linux when using libaio and not setting `direct=1', since buffered I/O is not async on that OS.

Confused with this whole statement.

A quirk of Linux is that non-O_DIRECT I/O (the default) goes through the buffer cache (this is so called buffered I/O). Because of this, even though you think you've submitted asynchronously (by using Linux AIO) you actually just end up with synchronous behaviour. See https://github.com/axboe/fio/issues/512#issuecomment-356604533 for a differently worded explanation.

Why is Submit and complete shows 1-4

Your configuration has this:

buffered=1

You didn't heed the warning you were wondering about earlier! buffered=1 is the same as saying direct=0. Even if you did have direct=1, by default fio submits I/Os one at a time so if your device is so fast that it has completed the I/O before the next one is queued you may not see a depth higher than one. If you wish to force/guarantee batched submission then see the iodepth_batch_* options mentioned in the fio HOWTO/manual.

OK looping back to the questions in the title:

What does iodepth in fio tests really mean?

It is the maximum amount of outstanding I/O that fio will try and queue internally (but note that fio may never be able to reach it for reasons given above and below).

Is it [iodepth] the queue depth?

Maybe and further it also depends on what you mean by "queue depth". If you mean the avgqu-sz as reported by a tool such as Linux's iostat then the iodepth may be similar or wildly different depending on things like the ioengine being used, the options being used with that I/O engine, the type and style of the I/O being submitted, the layers it has to travel through until it reaches the level being reported etc.

I think you've asked variations on these questions in quite a few different places - e.g. the fio mailing list has an answer to some of the above - and THAT mail mentions you also posted on https://unix.stackexchange.com/questions/459045/what-exactly-is-iodepth-in-fio . You may want to take care because you're potentially getting people to give answers to questions that have actually been already answered elsewhere and you're not linking them together which makes discovering the duplicate answers hard...

Anon
  • 1,210
  • 10
  • 23
  • Thanks for the detailed answer. Actually I thought serverfault was wrong place to ask the question and then I posted on Unix stack exchange.Then I discovered the forum and asked there again :) Now, reading your answer against the reply in the forum explains everything. Thank you! – GP92 Aug 04 '18 at 10:30
3

From https://tobert.github.io/post/2014-04-17-fio-output-explained.html

submit and complete represent the number of submitted IOs at a time by fio and the number completed at a time. In the case of the thrashing test used to generate this output, the iodepth is at the default value of 1, so 100% of IOs were submitted 1 at a time placing the results in the 1-4 bucket. Basically these only matter if iodepth is greater than 1.

This means that the first line shows what was the number of outstanding IOs you had at any point in time and that is in-line with your defined iodepth.

The submit line shows how many IOs were submitted at each time there was a submission and that essentially shows that IOs were submitted at 4 at a time and the complete line shows that 4 IOs returned in each poll cycle so fio also submitted 4 IOs in return.

In general, io depth and queue depth are the same. They are the number of IOs a device/controller can have outstanding at a time, the other IOs will be pending in a queue at the OS/app level.

You use a low queue depth to get lower latencies and a higher queue depth to get better throughput. The device uses the queue depth either for internal parallelism (SSDs) and/or for reordering and merging of related IOs (HDDs and SSDs).

Baruch Even
  • 1,043
  • 6
  • 18