I have a related question about this problem, but it got too complicated and too big, so I decided I should split up the issue into NFS and local issues. I have also tried asking about this on the zfs-discuss mailing list without much success.
Slow copying between NFS/CIFS directories on same server
Outline: How I'm setup and what I'm expecting
- I have a ZFS pool with 4 disks. 2TB RED configured as 2 mirrors that are striped (RAID 10). On Linux, zfsonlinux. There are no cache or log devices.
- Data is balanced across mirrors (important for ZFS)
- Each disk can read (raw w/dd) at 147MB/sec in parallel, giving a combined throughput of 588MB/sec.
- I expect about 115MB/sec write, 138MB/sec read and 50MB/sec rewrite of sequential data from each disk, based on benchmarks of a similar 4TB RED disk. I expect no less than 100MB/sec read or write, since any disk can do that these days.
- I thought I'd see 100% IO utilization on all 4 disks when under load reading or writing sequential data. And that the disks would be putting out over 100MB/sec while at 100% utilization.
- I thought the pool would give me around 2x write, 2x rewrite, and 4x read performance over a single disk - am I wrong?
- NEW I thought a ext4 zvol on the same pool would be about the same speed as ZFS
What I actually get
I find the read performance of the pool is not nearly as high as I expected
bonnie++ benchmark on pool from a few days ago
Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP igor 63G 99 99 232132 47 118787 27 336 97 257072 22 92.7 6
bonnie++ on a single 4TB RED drive on it's own in a zpool
Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP igor 63G 101 99 115288 30 49781 14 326 97 138250 13 111.6 8
According to this the read and rewrite speeds are appropriate based on the results from a single 4TB RED drive (they are double). However, the read speed I was expecting would have been about 550MB/sec (4x the speed of the 4TB drive) and I would at least hope for around 400MB/sec. Instead I am seeing around 260MB/sec
bonnie++ on the pool from just now, while gathering the below information. Not quite the same as before, and nothing has changed.
Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP igor 63G 103 99 207518 43 108810 24 342 98 302350 26 256.4 18
zpool iostat during write. Seems OK to me.
capacity operations bandwidth pool alloc free read write read write -------------------------------------------- ----- ----- ----- ----- ----- ----- pool2 1.23T 2.39T 0 1.89K 1.60K 238M mirror 631G 1.20T 0 979 1.60K 120M ata-WDC_WD20EFRX-68AX9N0_WD-WMC300004469 - - 0 1007 1.60K 124M ata-WDC_WD20EFRX-68EUZN0_WD-WCC4MLK57MVX - - 0 975 0 120M mirror 631G 1.20T 0 953 0 117M ata-WDC_WD20EFRX-68AX9N0_WD-WCC1T0429536 - - 0 1.01K 0 128M ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M0VYKFCE - - 0 953 0 117M
zpool iostat during rewrite. Seems ok to me, I think.
capacity operations bandwidth pool alloc free read write read write -------------------------------------------- ----- ----- ----- ----- ----- ----- pool2 1.27T 2.35T 1015 923 125M 101M mirror 651G 1.18T 505 465 62.2M 51.8M ata-WDC_WD20EFRX-68AX9N0_WD-WMC300004469 - - 198 438 24.4M 51.7M ata-WDC_WD20EFRX-68EUZN0_WD-WCC4MLK57MVX - - 306 384 37.8M 45.1M mirror 651G 1.18T 510 457 63.2M 49.6M ata-WDC_WD20EFRX-68AX9N0_WD-WCC1T0429536 - - 304 371 37.8M 43.3M ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M0VYKFCE - - 206 423 25.5M 49.6M
This is where I wonder what's going on
zpool iostat during read
capacity operations bandwidth pool alloc free read write read write -------------------------------------------- ----- ----- ----- ----- ----- ----- pool2 1.27T 2.35T 2.68K 32 339M 141K mirror 651G 1.18T 1.34K 20 169M 90.0K ata-WDC_WD20EFRX-68AX9N0_WD-WMC300004469 - - 748 9 92.5M 96.8K ata-WDC_WD20EFRX-68EUZN0_WD-WCC4MLK57MVX - - 623 10 76.8M 96.8K mirror 651G 1.18T 1.34K 11 170M 50.8K ata-WDC_WD20EFRX-68AX9N0_WD-WCC1T0429536 - - 774 5 95.7M 56.0K ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M0VYKFCE - - 599 6 74.0M 56.0K
iostat -x during the same read operation. Note how IO % is not at 100%.
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.60 0.00 661.30 6.00 83652.80 49.20 250.87 2.32 3.47 3.46 4.87 1.20 79.76 sdd 0.80 0.00 735.40 5.30 93273.20 49.20 251.98 2.60 3.51 3.51 4.15 1.20 89.04 sdf 0.50 0.00 656.70 3.80 83196.80 31.20 252.02 2.23 3.38 3.36 6.63 1.17 77.12 sda 0.70 0.00 738.30 3.30 93572.00 31.20 252.44 2.45 3.33 3.31 7.03 1.14 84.24
zpool and test dataset settings:
- atime is off
- compression is off
- ashift is 0 (autodetect - my understanding was that this was ok)
- zdb says disks are all ashift=12
- module - options zfs zvol_threads=32 zfs_arc_max=17179869184
- sync = standard
Edit - Oct, 30, 2015
I did some more testing
- dataset bonnie++ w/recordsize=1M = 226MB write, 392MB read much better
- dataset dd w/record size=1M = 260MB write, 392MB read much better
- zvol w/ext4 dd bs=1M = 128MB write, 107MB read why so slow?
- dataset 2 processess in parallel = 227MB write, 396MB read
- dd direct io makes no different on dataset and on zvol
I am much happier with the performance with the increased record size. Almost every file on the pool is way over 1MB. So I'll leave it like that. The disks are still not getting 100% utilization, which makes me wonder if it could still be much faster. And now I'm wondering why the zvol performance is so lousy, as that is something I (lightly) use.
I am happy to provide any information requested in the comments/answers. There is also tons of information posted in my other question: Slow copying between NFS/CIFS directories on same server
I am fully aware that I may just not understand something and that this may not be a problem at all. Thanks in advance.
To make it clear, the question is: Why isn't the ZFS pool as fast as I expect? And perhaps is there anything else wrong?