how to get block sizes larger than 4k in reads from an md raid1 volume

Question

I wanted to setup a raid01 configuration (a raid1 composed of two raid0s) with one of the raid0 volumes having the write-mostly state set so all reads would go to the other (i.e. one is disk, the other flash). However that entire plan has run into a problem in that reads directly from the raid0 are correct at 64k per disk (the chunk size), but when I add a raid1 on top of raid0, all the reads drop down to only 4k so the performance is terrible. I'm guessing that it is due to md (or something in the stack) has decided that 4k is the granularity for errors so it is doing reads in this size, but that is just a guess. Anyway I really need to find a way to fix it.

To test this I'm using a raid1 with only 1 side for simplicity i.e. that was created via

mdadm --create /dev/md2 -l 1 -n 2 /dev/md1 "missing"

Another item of interest is that the dd bs=512K on the raid0 md1 array shows 64k reads on md1 and all of its components, where I would have expected iostat to show md1 having 512K reads and its component disks to be 64K. The dd bs=512K from md2 shows 4K reads to everything. I'm computing the block size by just taking MB/s divided by tps which = MB/transaction.

Here are all the details.

[root@pe-r910 ~]# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Tue Jul 26 23:13:59 2011
     Raid Level : raid1
     Array Size : 1998196216 (1905.63 GiB 2046.15 GB)
  Used Dev Size : 1998196216 (1905.63 GiB 2046.15 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Thu Jul 28 08:29:35 2011
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : pe-r910.ingres.prv:2  (local to host pe-r910.ingres.prv)
           UUID : 299ea821:756847a0:4db591e4:38769641
         Events : 160

    Number   Major   Minor   RaidDevice State
       0       9        1        0      active sync   /dev/md1
       1       0        0        1      removed

[root@pe-r910 ~]# mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Tue Jul 26 01:05:05 2011
     Raid Level : raid0
     Array Size : 1998197376 (1905.63 GiB 2046.15 GB)
   Raid Devices : 14
  Total Devices : 14
    Persistence : Superblock is persistent

    Update Time : Tue Jul 26 01:05:05 2011
          State : clean
 Active Devices : 14
Working Devices : 14
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 64K

           Name : pe-r910.ingres.prv:1  (local to host pe-r910.ingres.prv)
           UUID : 735bd502:62ed0509:08c33e15:19ae4f6b
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
       5       8       97        5      active sync   /dev/sdg1
       6       8      113        6      active sync   /dev/sdh1
       7       8      129        7      active sync   /dev/sdi1
       8       8      145        8      active sync   /dev/sdj1
       9       8      161        9      active sync   /dev/sdk1
      10       8      177       10      active sync   /dev/sdl1
      11       8      193       11      active sync   /dev/sdm1
      12       8      209       12      active sync   /dev/sdn1
      13       8      225       13      active sync   /dev/sdo1
[root@pe-r910 ~]# dd if=/dev/md1 bs=512K count=10000 iflag=nonblock,direct of=/dev/null
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 3.45236 s, 1.5 GB/s
[root@pe-r910 ~]# dd if=/dev/md2 bs=512K count=10000 iflag=nonblock,direct of=/dev/null
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 6.81182 s, 770 MB/s
[root@pe-r910 ~]#

update: This looks only to be an issue for md on md. If I make a raid1 directly on a disk, its read rate is the same as the disk. So I think I can reconfigure to be raid10 (a set of raid1's made into a radi0) instead of raid01 (2 raid0's made into a raid1).

'having the write-mostly state set so all reads would go to the other' - on a RAID1 *mirror*? A bit counter-intuitive there; I take it you don't care about your data? — Avery Payne, Jan 26 '14 at 03:20

score 0 · Answer 1 · answered Jul 28 '11 at 22:09

0

use --chunk=64k or --chunk=128k when do create device. for real chunk size see ur disk specification/tests.

unfortanly no way to change that after creation.

also there are stripe cache side parameter,which can affect

see this article http://www.amiryan.org/2009/04/10/solved-linux-software-raid-5-too-slow/

answered Jul 28 '11 at 22:09

arheops

708
1
5
13

`--chunk` isn't valid for RAID1. – womble Jul 29 '11 at 00:02
https://raid.wiki.kernel.org/index.php/Chunk_size – arheops Jul 29 '11 at 12:29
That page is wrong. RAID-1 doesn't involve striping, RAID-0 uses striping. The `mdadm` man page for the `--chunk` option, it *explicitly* says "This is only meaningful for RAID0, RAID4, RAID5, RAID6, and RAID10." – womble Jul 29 '11 at 12:32
never used raid0, but used chunk. so how it was working?) – arheops Jul 29 '11 at 12:40
http://www.devil-linux.org/documentation/1.0.x/ch01s05.html also wrong page? – arheops Jul 29 '11 at 12:41
Yes. Anything you find that says that `--chunk` is valid for mdadm RAID-1 is wrong, *wrong*, **WRONG**. Look at the `mdstat` contents for the RAID-0 vs RAID-1 in that page -- the RAID-0 shows the chunksize, the RAID-1 doesn't. Why is that, do you think? Similarly, look at the `mdadm --detail` output in this very question, no mention of chunksize for the RAID-1 array there either. Why? **BECAUSE CHUNK SIZE IS INVALID FOR A RAID-1 ARRAY**. – womble Jul 29 '11 at 12:44
but i even did experiments with chunk size/perfomance. and it was differnt performance. yes, i use raid1 only. so ho wit can be? – arheops Jul 29 '11 at 12:45
@womble let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/943/discussion-between-arheops-and-womble) – arheops Jul 29 '11 at 12:45

score -1 · Answer 2 · answered Jul 28 '11 at 22:00

-1

This may be obvious, but have you tried specifying --chunk=64k when you create the device?

answered Jul 28 '11 at 22:00

pjz

10,497
1
31
40

`--chunk` isn't valid for RAID1. – womble Jul 29 '11 at 00:02

how to get block sizes larger than 4k in reads from an md raid1 volume

2 Answers2