The problem here is that while NVMe and SSDs in general are faster than spinning rust due to using flash memories, the ability of NVMe to transfer multiple gigabytes of data per second is due to the way the flash memory is arranged around the controller.
Fast flash devices effectively use a scheme similar to RAID0 across what are (on their own) simply fast flash memory chips. Each chip on its own can handle a certain speed, but tied together with its siblings can achieve a much higher aggregate speed by having data written to and read from multiple devices simultaneously.
Effectively large transfers can take advantage of transfer parallelism and request multiple blocks from multiple chips and so reduce what would be 8 seeks times down to a single seek (across multiple chips) along with a larger transfer. The controller will have buffering and queueing to be able to sequentially stream out the data in whichever direction is required.
The individual flash chips themselves may also be configured to read ahead a few blocks for future requests and (for writes) cache it in a small internal buffer to further reduce delays for future requests.
The problem with working with lots of small files is that it ends up defeating all of the smarts used to achieve a single massive transfer. The controller has to operate in a queue going between flash devices requesting a block of data, waiting for a response, looking at the next item in the queue, requesting that data and so on.
If the data being read or written is on another chip then it might be able to use multiple channels but if a lot of the requests end up on the same chip for a period, as it could for lots of small writes, then what you end up seeing is the performance of a single flash chip rather than the full performance of an array of chips.
So thousands of small reads or writes could actually show you the performance of only a small part of your NVMe device, rather than what the device is fully capable of under so-called "perfect" conditions.
1in real life... You say that like it's a universal truth. Can you substantiate this claim? Please do not only respond in the comments. Instead, [edit] the post with this information. – I say Reinstate Monica – 2019-09-05T18:34:56.630
@TwistyImpersonator I've searched the web for a contradictory source and only found one lead which turned out to be useless. It seems like there's practically no disagreement about that. That's why I omitted that from the question. Just like I omitted the fact that SSDs are faster than HDDs.
– ispiro – 2019-09-05T18:50:01.313Seek latency becomes a real problem with random reads and all the speed benefits of SSDs are lost with tiny reads: https://superuser.com/a/1168029/19943 This question feels like a slightly differently phrased duplicate of that one...
– Mokubai – 2019-09-05T18:57:14.937@Mokubai Every seek should see the speed difference. Your answer there explains the difference between small and large files. Not the difference between different types of drives. Every seek, every write amplification, every part should see the speed gain. – ispiro – 2019-09-05T19:05:23.140
@ispiro you're making assumptions about how the electronics works. In order to obtain their massive speeds they do a lot of work with parallellism and queueing both with flash chips and in the main electronics. Once you do lots of seeks that are below a certain threshold you start to loose any benefits of how the device was designed. Seeks are faster, but parallel transfers, queueing and caching are what achieve the phenomenal sequential speeds. Small transfers see only the benefits of an effective single controller thread and/or flash chip. – Mokubai – 2019-09-05T19:13:11.643
@Mokubai If you could expound on your comment, that might just be the answer to my question. – ispiro – 2019-09-05T19:15:34.113
@Mokubai: This seems like an over-simplification. I have on purpose said only that OS bookkeeping and cache flushing interrupt the smooth transfer of data, as I don't think the interaction between OS and SSD can be simply defined and does depend on too many parameters. – harrymc – 2019-09-05T19:27:03.190
@harrymc I've posted what I mean as an answer. While I agree that there is some added time and overhead in queuing and seeking, a lot of the performance benefits of the drives is due to how the controllers are designed and how the memory devices are laid out around it. – Mokubai – 2019-09-05T19:37:39.350
@ispiro This isn’t a bad question, but your tone is a bit harsh. Your edit clarifies things but seriously, saying something like “…let’s go over this again…” is not and inspiring statement. – JakeGould – 2019-09-05T19:42:09.200
1@JakeGould Thanks. Point taken. I edited that line. Did you mean there was another part where the tone was harsh? – ispiro – 2019-09-05T19:46:39.263
@ispiro Nope! Your edit is perfect. And the question is decent. – JakeGould – 2019-09-05T20:06:44.030