0

Linux md raid driver delays recovery/verification of multiple devices if they share the same parent device.

We have setup where disks are pationed via device mapper, they use bcache, etc.. and finally they are raided with md. This unfortunately means that the md driver does not recognize them as sharing same physical parent. And that leads to a massive I/O congestion when automatic verification starts (there are like 10 raids on every single physical disk).

I tried to find any documentation about the mechanism how the same parent is detected but I failed.

Is there any way how to "hint" the md driver which raids are on what physical discs or which md raids share the same parent?

The other option would be to disable the automatic verification and script my own "one-by-one" verification or mayber there is some sort of daemon for that already. But I feel that making the detection work is better way...

(I do not want to change the setup as we are extremly satisified with that in other means)

EDIT: It is a Fedora Linux (version 19 to be exact)

MadHatter
  • 78,442
  • 20
  • 178
  • 229
Radek Hladík
  • 600
  • 1
  • 3
  • 14

1 Answers1

2

I'm assuming a Red Hat-based system, since you didn't specify (and it may be important):

The quick fix would be to edit /etc/sysconfig/raid-check and set MAXCONCURRENT=1. This will cause all your RAID arrays to be checked sequentially.

As for the algorithm, /usr/sbin/raid-check is just a shell script, and you can easily read it to see what it's doing.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • I edited the question with system version. But I think that it is not so important as the detection is in the kernel driver (and for quite some time). The raidcheck script "only asks" the kernel to verify the raids. But you are right that that would be the way how to fix it. Unfortunatelly it will not work for recovery... – Radek Hladík Mar 24 '14 at 12:48
  • If you're referring to the verification that (by default) runs every Sunday morning, then this will take care of it. – Michael Hampton Mar 24 '14 at 12:51
  • You are right. That verification is what brought the issue to my attention. I did already set the value to 1 and if no one will come with more systematic solution I will mark you answer as answer. – Radek Hladík Mar 24 '14 at 12:55