53

I want to interrupt a running resync operation on a debian squeeze software raid. (This is the regular scheduled compare resync. The raid array is still clean in such a case. Do not confuse this with a rebuild after a disk failed and was replaced.)

How to stop this scheduled resync operation while it is running? Another raid array is "resync pending", because they all get checked on the same day (sunday night) one after another. I want a complete stop of this sunday night resyncing.

[Edit: sudo kill -9 1010 doesn't stop it, 1010 is the PID of the md2_resync process]

I would also like to know how I can control the intervals between resyncs and the remainig time till the next one.

[Edit2: What I did now was to make the resync go very slow, so it does not disturb anymore:

sudo sysctl -w dev.raid.speed_limit_max=1000

taken from http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html

During the night I will set it back to a high value, so the resync can terminate.

This workaround is fine for most situations, nonetheless it would be interesting to know if what I asked is possible. For example it does not seem to be possible to grow an array, while it is resyncing or resyncing "pending"]

Adam5
  • 531
  • 1
  • 4
  • 4

9 Answers9

52

If your array is md0 then echo "idle" > /sys/block/md0/md/sync_action

'idle' will stop an active resync/recovery etc. There is no guarantee that another resync/recovery may not be automatically started again, though some event will be needed to trigger this.

http://www.mjmwired.net/kernel/Documentation/md.txt#477

Mark Wagner
  • 17,764
  • 2
  • 30
  • 47
  • I could update the textfile after changing its permissions, but the content of the file is changed back to "resync" from behind in the same instance. And the resync continues at the other array (which was formerly "pending"). When I write "idle" to the other array's file it swaps again, but never stops. – Adam5 Dec 27 '10 at 22:15
  • 2
    If you have multiple raids: echo idle | sudo tee /sys/block/md*/md/sync_action – Ole Tange Jun 03 '13 at 12:38
  • Actually "idle" only pauses the check. The next "check" would continue at `/sys/block/md0/md/sync_min’. To reset this write `0` to this file. – rudimeier Dec 21 '17 at 22:36
  • 1
    This is great if you want to resync some other md array which shares some physical disk, and you have the new array stuck in the DELAYED status. Once the new array finishes resyncing, the original one _should_ automatically start resyncing again (it **actually** does that on Debian 6, kernel 2.6.32) – gog Jan 15 '20 at 10:15
  • Does it reset to the normal behaviour at the next restart? – Sandburg Apr 23 '20 at 11:38
41

I wanted to slow down or pause the resync process to save some I/O to backup some stuff on another computer. This thread helped me but I found another solution.

On my Debian Lenny :

  • echo "idle" > /sys/block/md0/md/sync_action works but the resync process is immediately restarted.

  • checkarray -x --all : works, but same result: the resync process is immediately restarted.

So I use this method: echo 0 > /proc/sys/dev/raid/speed_limit_max

small
  • 511
  • 4
  • 2
  • 1
    Interesting approach. I found that you also need to throttle down the value in speed_limit_min. – Diomidis Spinellis Dec 31 '13 at 15:05
  • 1
    I also needed to set `speed_limit_min` to 0 to totally pause the resync. – njahnke Dec 17 '14 at 15:45
  • If md device is immediately syncing again after you echo `idle` to `sync_action`, the raid is not clean and you need to sync it in any case. Setting the `speed_limit_min` and `speed_limit_max` is the correct way forward in that case. In my experience, the `speed_limit_max` is actually too low and resync takes much longer than actually needed. – Mikko Rantalainen Jul 07 '21 at 11:38
  • according to my tests `echo [value] > /proc/sys/dev/raid/speed_limit_max` does only affect the resync but not the normal operation (as written by s.o. below). so if your raid ist stalling because of the rebuild like it did for me simply set the speed limits to a low value like e.g. 1000 which supposedly are kilobytes i r ead somewhere. – ede-duply.net Aug 18 '21 at 10:34
25

You can cancel an array resync in progress using the following sequence of commands (as root):

echo frozen > /sys/block/md0/md/sync_action
echo none > /sys/block/md0/md/resync_start
echo idle > /sys/block/md0/md/sync_action

Note that this may leave your array in an inconsistent state. Don't do this unless you're sure the array is in good shape, and rerun the sync later.

(Credit where credit's due: found this incantation in this thread.)

  • This is the only one which worked for me, when I wanted to abort the initial (dare I say, pointless) resync-after-create process. – chutz May 28 '20 at 12:23
  • @chutz if you have RAID 5 or RAID 6 setup and you don't resync after create, your disks may fail to build into consistent redundant state (that is, after losing one or more of your disks). Possible errors *should* appear only in the free space of your storage, though. – Mikko Rantalainen Jul 07 '21 at 11:41
9

As mentioned above, on Debian/Ubuntu systems the /etc/cron.d/mdadm script invokes the /usr/share/mdadm/checkarray script to initiate re-sync checks.

This script has an option for cancelling all running sync checks:

/usr/share/mdadm/checkarray -x --all
sanmai
  • 521
  • 5
  • 19
7

Possible solution for this, took a bit to get into the details.

My system: CentOS 6.5 mdadm v3.3.2

Constant checks every week, wanted to pause one of them, RAID is clean, check was called via the /etc/cron.d/raid-check script which is run weekly.

To cancel the check, you use the --misc --action function. Assuming the RAID device is /dev/md0 and this is just the weekly consistency check and not a device failure, you would, as root:

mdadm --misc --action=idle /dev/md0

Likewise, to start the consistency check

mdadm --misc --action=check /dev/md0

bill.rookard
  • 71
  • 1
  • 1
3

Not sure about how to cancel a re-sync, but the schedule is controlled by /etc/cron.d/mdadm on Debian/Ubuntu systems.

The script /usr/share/mdadm/checkarray may shed some light on the other part of your question, since that is what is being called by cron.

Zoredache
  • 128,755
  • 40
  • 271
  • 413
3

If your md device is md0 and you want to stop the resync write:

echo "idle" > /sys/block/md0/md/sync_action
mgorven
  • 30,036
  • 7
  • 76
  • 121
Victor
  • 31
  • 1
3
echo "idle" > /sys/block/md0/md/sync_action

Does not work when /sys/block/md*/md/sync_action is "resync" (unlike if its state is "check" or "repair". You can echo "idle" into the sync_action file, however it does not effect the progress. This kernel documentation file here incorrectly states that it will work, but it has never worked for me:

'idle' will stop an active resync/recovery etc. There is no guarantee that another resync/recovery may not be automatically started again, though some event will be needed to trigger this.

Sven
  • 97,248
  • 13
  • 177
  • 225
brian
  • 31
  • 1
  • 2
    You can however effect the rate of the "resync" with /sys/block/md*/md/sync_speed_max in this state. I'm not sure why the documenation is incorrect, maybe no one knows – brian Oct 29 '12 at 01:19
  • Please take a minute of time to learn the [SE] markdown syntax (http://meta.serverfault.com/editing-help) – Sven Oct 29 '12 at 03:46
0

I know this is a 4 years old post but you can also do this (assuming md0 as the array and sdb4 as the resyncing "disk"):

    mdadm /dev/md0 --fail /dev/sdb4 && mdadm /dev/md0 --remove /dev/sdb4

This command pretends sdb4 to be a failed disk and therefore kicks it from the array, stopping the resync. If there was no error during the resync-stop action then this command will also remove sdb4 from the md0 array. If there was any error then the disk stays in failed state but remains in the array.

If you fail a disk anywhere in mdadm, you set it logically failed. If the array was clean (not degraded) then the disk stays consistent and can be re-added by the --add << disk >> --assume-clean option without any fear. If there was any action after it was detached (eg. resync, rebuild, or even a write) then --assume-clean will probalby fail and start a resync action immediately.

Changing raid.speed_limit_min and raid.speed_limit_max is somehow a bad idea because it affects not only resync/rebuild speeds but also the normal operation speeds, and probably you will lose a lot of performance gained by using RAID arrays.

eth
  • 25
  • 1
  • 6
    I think it is a bad idea to remove a healthy disk from the RAID. Most of the time it may not cause any problems, but each time you do it, there is a risk of causing data loss or data corruption. – kasperd Aug 14 '15 at 11:13
  • 3
    Don't ever do this. If you add a disk with `--assume-clean` on a live system, and *you had any writes happen* on the remaining disk, you're asking for trouble. – sanmai May 29 '17 at 01:45