First of all: to those, who still believes in "RAID0 has no hot spare". It could have a manual spare, done by human, who understand RAID levels and mdadm. mdadm is software RAID, so it could do a lot of interesting things.
Credits to Zoredache for the idea!
So, the situation:
- you have RAID0 array of two disks
- you would like to replace one of them without array downtime
If the downtime is acceptable, you always can just make a block copy of disk with dd and reassemble the array, mdadm will do OK.
Solution: use RAID4 as intermediate solution
RAID0 -> RAID4 -> RAID0
So, if you don't remember RAID4, it is simple. It has a parity block, but unlike RAID5 it is not distributed across the array, but resides on ONE disk. That's the point, this is important and this is the reason RAID5 will not work.
What you'll need: two more disks of the same size, as the disk you would like to replace.
Environment:
- Ubuntu 14.04 Thrusty Thar
- mdadm - v3.2.5 - 18th May 2012
- /dev/sdb - start with it, will replace it
- /dev/sdc - start with it
- /dev/sdd - will be used temporary
- /dev/sde - will be used instead of sdb
The ultimate RAID0 hot-spare mdadm guide ;)
sudo mdadm -C /dev/md0 -l 0 -n 2 /dev/sd[bc]
md0 : active raid0 sdc[1] sdb[0]
2096128 blocks super 1.2 512k chunks
We've created raid0 array, it looks sweet.
sudo md5sum /dev/md0
b422ba644a3c83cdf28adfa94cb658f3 /dev/md0
This is our check point - if even one bit will differ in resulting /dev/md0
- we've failed.
sudo mdadm /dev/md0 --grow --level=4
md0 : active raid4 sdc[1] sdb[0]
2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]
So, we've grown our array to be RAID4. We haven't added the parity disk yet, so let's do it. The grow will be instant - there is nothing to recompute or recalculate.
sudo mdadm /dev/md0 -a /dev/sdd
md0 : active raid4 sdd[3] sdc[1] sdb[0]
2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]
[===>.................] recovery = 19.7% (207784/1048064) finish=0.2min speed=51946K/sec
We've added sdd
as parity disk. This is important to remember - the order of disks in the first row is not syncronized with the picture in second row! [UU_]
sdd
is displayed first, but in fact it is last one, and holds not the data, but the parity.
sudo mdadm /dev/md0 -f /dev/sdb
md0 : active raid4 sdd[3] sdc[1] sdb[0](F)
2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [_UU]
We've made our disk sdb faulty, to remove it in the next steps.
sudo mdadm --detail /dev/md0
State : clean, degraded
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 32 1 active sync /dev/sdc
3 8 48 2 active sync /dev/sdd
0 8 16 - faulty spare /dev/sdb
Details show us the removal of the first disk and here we can see the true order of the disks in the array. It's important to track the disk with parity, we should not leave it in the array when going back to RAID0.
sudo mdadm /dev/md0 -r /dev/sdb
md0 : active raid4 sdd[3] sdc[1]
2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [_UU]
sdb
is completely removed, could be taken away.
sudo mdadm /dev/md0 -a /dev/sde
md0 : active raid4 sde[4] sdd[3] sdc[1]
2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [_UU]
[==>..................] recovery = 14.8% (156648/1048064) finish=0.2min speed=52216K/sec
We have added the replacement for our sdb disk. And here we go: now the data of sdb is being recovered using parity. Sweeeeet.
md0 : active raid4 sde[4] sdd[3] sdc[1]
2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/3] [UUU]
Done. Right now we are completely safe - all data from sdb are recovered, and now we have to remove sdd (remember, it holds parity).
sudo mdadm /dev/md0 -f /dev/sdd
md0 : active raid4 sde[4] sdd[3](F) sdc[1]
2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]
Made sdd faulty.
sudo mdadm /dev/md0 -r /dev/sdd
md0 : active raid4 sde[4] sdc[1]
2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]
Removed sdd from our array. We are ready to become RAID0 again.
sudo mdadm /dev/md0 --grow --level=0 --backup-file=backup
md0 : active raid4 sde[4] sdc[1]
2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]
[=>...................] reshape = 7.0% (73728/1048064) finish=1.5min speed=10532K/sec
Aaaaaaand bang!
md0 : active raid0 sde[4] sdc[1]
2096128 blocks super 1.2 512k chunks
Done. Let's look at md5 checksum.
sudo md5sum /dev/md0
b422ba644a3c83cdf28adfa94cb658f3 /dev/md0
Any more questions? So RAID0 could have a hot spare. It's called "user" ;)