Raid 1, being a mirror, depends on all disks in a mirror being exact copies of each other. Take your random hard drive, and another random hard drive, and you possibly have different data there, thus violating this presumption. This is why initialization is needed. It simply copies contents of the first drive to others. Note that in some conditions you can get away with not initializing the drives - usually factory-new devices already have zeros all over the place, so you can simply ignore this. The mdadm
option --assume-clean
does this, but warns you:
--assume-clean
Tell mdadm that the array pre-existed and is known to be clean. It can be useful when trying to recover from a major failure as you can be sure that no data will be affected unless you actually write to the array. It can also be used when creating a RAID1 or RAID10 if you want to avoid the initial resync, however this practice -- while normally safe -- is not recommended. Use this only if you really know what you are doing.
If you don't do it, there is a discrepancy between the drives and it's read, there's no knowing what the drive will read. You should be pretty safe with a filesystem (but note below), because most probably you'll write before you read anything from that device, and then you're clear.
Note that at least Linux's mdadm
will initialize the array in background. You can happily create FS on top of it the first second. The performance is going to suffer until the initialization is finished, but that's everything.
But:
a) When doing mkfs
some utilities check if there's something on that drive already. While this only touches a few well-known regions of drive, it reads before you write anything, thus putting you in danger.
b) If you do a periodic resync of your array, the RAID device knows nothing of your FS. It simply reads every block from every device and compares those. And if you are not using a copy-on-write FS (e.g. ZFS or BTRFS) and never fill your FS, it's perfectly plausible for a block to stay uninitialized from FS perspective for years.
Why resyncing with RAID1 devices?
For the same reason you resync with RAID5 devices or any other level (except RAID0). It reads all data and compares/verifies RAID checksums (in RAID 5 or 6). If a bit was flipped in any way (because the HD memory got spontaneous flip, because the cellphones of you and your 5 neighbours just accidentally interferenced over this particular region of platter, whatever) it will detect inconsistency, but won't be able to help you. If, OTOH, one of the hard drives will simply report "I cannot read that block", which is more probable with a failing drive, you just have detected a failure early, and reduced time you're running in degraded mode (counting from the drive failure, not from when you notice it). Raid won't help you if one drive fails and a month later the other one fails if you don't notice the first failure in that month.
RAID10
Now, for RAID10 all of the above holds. After all RAID10 is just a clever way of telling 'I'm putting my two RAID1 devices in a RAID0 pair'.
Caveat:
This is all undefined behavour. Why I've checked on Linux, using mdadm
, other software RAID implementations may behave differently. Other versions of Linux kernel and/or mdadm
tools than I'm using also may behave differently.