1

We have rent some servers from Hetzner.de. A few of them have NVMe hard disks while the others have SSDs. We becnchmarked the read/write performance on 4 of our servers using the following commands:

fio
dd
hdparm

The operating system is CentOS7 and each server has two hard disks with software Raid 1. All servers are localted in Hetzner datacenter.

Disks brands:

SSD:
    Model Family:     Samsung based SSDs
    Device Model:     SAMSUNG MZ7LM240HCGR-00003
NVMe:
    Model Number:                       THNSN5512GPU7 TOSHIBA
    Serial Number:                      Z62S101OTUHV

Here are the benchmark results:

Server1(NVMe):

Base Board Information
        Manufacturer: FUJITSU
        Product Name: D3417-B1
        Version: S26361-D3417-B1

# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

Run status group 0 (all jobs):
   READ: bw=45.9MiB/s (48.1MB/s), 45.9MiB/s-45.9MiB/s (48.1MB/s-48.1MB/s), io=3070MiB (3219MB), run=66884-66884msec
  WRITE: bw=15.3MiB/s (16.1MB/s), 15.3MiB/s-15.3MiB/s (16.1MB/s-16.1MB/s), io=1026MiB (1076MB), run=66884-66884msec

Disk stats (read/write):
    md127: ios=785293/276106, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=393078/273876, aggrmerge=3/9643, aggrticks=330689/2134457, aggrin_queue=2467357, aggrutil=63.84%
  nvme0n1: ios=410663/273879, merge=7/9640, ticks=257384/2054071, in_queue=2311731, util=55.06%
  nvme1n1: ios=375494/273874, merge=0/9647, ticks=403994/2214844, in_queue=2622983, util=63.84%

#dd if=/dev/zero of=/root/testfile bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 14.2603 s, 75.3 MB/s

# hdparm -Tt /dev/nvme0n1

/dev/nvme0n1:
 Timing cached reads:   29320 MB in  1.98 seconds = 14818.11 MB/sec
 Timing buffered disk reads: 2660 MB in  3.00 seconds = 886.22 MB/sec
------------------------------------------------------------
------------------------------------------------------------
Server2(NVMe):

Base Board Information
        Manufacturer: FUJITSU
        Product Name: D3417-B1
        Version: S26361-D3417-B1

# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

Run status group 0 (all jobs):
   READ: io=3072.2MB, aggrb=40296KB/s, minb=40296KB/s, maxb=40296KB/s, mint=78069msec, maxt=78069msec
  WRITE: io=1023.9MB, aggrb=13429KB/s, minb=13429KB/s, maxb=13429KB/s, mint=78069msec, maxt=78069msec

Disk stats (read/write):
    md1: ios=786339/298554, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=393673/300844, aggrmerge=0/0, aggrticks=543418/2294840, aggrin_queue=2838462, aggrutil=65.25%
  nvme0n1: ios=180052/300844, merge=0/0, ticks=480768/1879827, in_queue=2360788, util=56.22%
  nvme1n1: ios=607294/300844, merge=0/0, ticks=606068/2709853, in_queue=3316136, util=65.25%

#dd if=/dev/zero of=/root/testfile bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 33.2734 s, 32.3 MB/s

# hdparm -Tt /dev/nvme0n1

/dev/nvme0n1:
 Timing cached reads:   33788 MB in  1.99 seconds = 16977.90 MB/sec
 Timing buffered disk reads: 2204 MB in  3.00 seconds = 734.34 MB/sec

------------------------------------------------------------
------------------------------------------------------------
Server3(SSD)
Base Board Information
        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: Z10PA-U8 Series
        Version: Rev 1.xx

# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

Run status group 0 (all jobs):
   READ: bw=262MiB/s (275MB/s), 262MiB/s-262MiB/s (275MB/s-275MB/s), io=3070MiB (3219MB), run=11718-11718msec
  WRITE: bw=87.6MiB/s (91.8MB/s), 87.6MiB/s-87.6MiB/s (91.8MB/s-91.8MB/s), io=1026MiB (1076MB), run=11718-11718msec

Disk stats (read/write):
    md2: ios=769518/258504, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=392958/263227, aggrmerge=9/864, aggrticks=219931/33550, aggrin_queue=253441, aggrutil=99.06%
  sda: ios=402306/263220, merge=12/871, ticks=222960/35975, in_queue=258904, util=99.04%
  sdb: ios=383611/263234, merge=7/857, ticks=216902/31125, in_queue=247978, util=99.06%

#dd if=/dev/zero of=/root/testfile bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 5.19855 s, 207 MB/s

# hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   22452 MB in  1.99 seconds = 11258.90 MB/sec
 Timing buffered disk reads: 1546 MB in  3.00 seconds = 514.90 MB/sec

------------------------------------------------------------
------------------------------------------------------------
Server4(SSD)
Base Board Information
        Manufacturer: FUJITSU
        Product Name: D3401-H2
        Version: S26361-D3401-H2

# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

Run status group 0 (all jobs):
   READ: io=3073.6MB, aggrb=61065KB/s, minb=61065KB/s, maxb=61065KB/s, mint=51539msec, maxt=51539msec
  WRITE: io=1022.5MB, aggrb=20315KB/s, minb=20315KB/s, maxb=20315KB/s, mint=51539msec, maxt=51539msec

Disk stats (read/write):
    md2: ios=784514/278548, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=392439/266239, aggrmerge=1246/13570, aggrticks=822829/716748, aggrin_queue=1539532, aggrutil=91.82%
  sda: ios=421561/266337, merge=1030/13473, ticks=867321/639461, in_queue=1506738, util=91.82%
  sdb: ios=363317/266142, merge=1463/13667, ticks=778338/794035, in_queue=1572326, util=91.76%

# dd if=/dev/zero of=/root/testfile bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB) copied, 10.6605 s, 101 MB/s

# hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   33686 MB in  1.98 seconds = 16985.97 MB/sec
 Timing buffered disk reads: 1304 MB in  3.00 seconds = 434.34 MB/sec

As it can be seen from the results, the fio command shows disappointing read/write results for server1(NVMe). But it shows better results for server2(NVMe) in comparison to the SSDs. The dd command shows disappointing read/write results for both NVMe servers compared to the SSD ones. The hdparm command also shows almost the same results for read/write performance for all servers.

All tests where done on the off peak times where the average load of a server was 0.0 .

Another strange issue we are facing in our NVMe servers is high I/O load while we are restoring the backup of an account or even restoring a zip file. For example, if I restore a zip file with the size of 150MB, the average server load goes beyond 20 after it got unzipped completely and it is directly because of the I/O wait (from the 'top' command).

We wonder to know what causes the NVMes to have such a disappoiting performance in contrast with the SSDs? Do creating software or hardware raids affect the NVMes performce in which the read/write perfomance gets worst than SSDs? If it is, so why SSDs are working almost great with software or hardware raids?

Sinai
  • 193
  • 1
  • 2
  • 17
  • Please post the brand and specific model of the NMVe drives. – shodanshok Jan 01 '19 at 17:03
  • I edited my post and added hard disks brands and model – Sinai Jan 01 '19 at 17:34
  • 1
    Can you try a `fio` and `dd` (with `bs=1M`) run directly on the raw devices themselves? Be aware that this *will* overwrite data, so don't do that if you have valuable data on the drives. – shodanshok Jan 01 '19 at 22:00
  • @shodanshok, unfortunately we cannot do any tests on raw devices! Because the servers are production and not development. – Sinai Jan 02 '19 at 15:12
  • 1
    If so, you can't do much to debug the issue, short of renting another, empty server and run the tests on it. – shodanshok Jan 02 '19 at 18:41
  • Honestly, I think this answer does not make sense. I want to know the cause of this behavior. I mean why the NMVe performance is almost the same as SSD or even worse in some cases? because we are spending more money on NMVe servers. For example, does NMVe works worse on raid or there is a problem with the connector or the main board does not support this specific type of NMVe or something else. – Sinai Jan 03 '19 at 09:43
  • @Sinai Can you try and narrow down the problem area: for example, are things comparable when you just do reads or when you do writes? Do you have exactly the same filesystem a top of the same disk layouts? Are things well aligned in all cases? Do things get better for the NVMe disk when you put the iodepth right up? Is it something to do with the disk scheduler? The more you can isolate the quicker you will get to the troublespot... – Anon Jan 09 '19 at 22:31
  • I printed the results of both read and write in my question. All servers are using for Hosting purposes and all of them have CentOS7 with file system ext4. I did the tests when the average load was almost 0.0 . I have no idea about "disk scheduler" and "putting the iodepth right up"! I have to google about them. – Sinai Jan 12 '19 at 16:46
  • @sinai Very briefly: disk scheduler - https://github.com/torvalds/linux/blob/master/Documentation/block/switching-sched.txt , I/O depth - https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-iodepth (some NVMe disks can sustain 100s of I/Os). – Anon Jan 13 '19 at 22:22

0 Answers0