Why RAID1 is resynching so long?

2

I have created RAID1 array of two identical empty partitions.

Now it is saying:

$ tail -f /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdc1[1] sdb1[0]
      3906885440 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  1.6% (66113792/3906885440) finish=3821.3min speed=16750K/sec

So it plans to synch for 3 days. Is it normal? What is it actually doing? Calculating proton mass with quantum chromodynamics? Doesn't it see drives are identical?

UPDATE

It was proposed below, that while syncing, RAID1 is just blindly copying one partition onto another. If this true, then why isn't it possible to say it: relax, everything already identical, just mark yourself that data is already copied.

Like in fast formatting.

Is it possible?

Dims

Posted 2014-10-03T22:40:56.013

Reputation: 8 464

1BTW, are you sure you have your partitions properly aligned? If those are 4k drives and you didn't align things you could be be impacting performance a lot. – Zoredache – 2014-10-03T23:13:13.673

Can't imagine what aligning is. Drives are identical and were formatted identically. – Dims – 2014-10-03T23:15:47.917

The drives being identical is irrelevant. Newer drives align to 4k blocks, in the past they aligned to 512 byte blocks. If your partition is aligned improperly then every read and write operation actually involves reading 2 blocks off the hard drive when it should have read one. It makes a huge difference during something like an initial sync of a large RAID1, since a misaligned drive would be doing two times as much work.. See: http://en.wikipedia.org/wiki/Partition_alignment and http://superuser.com/questions/tagged/advanced-format

– Zoredache – 2014-10-03T23:57:46.383

BTW just to see, it might be useful if you posted the output of fdisk -u -l /dev/sdc /dev/sdb. – Zoredache – 2014-10-04T00:02:58.863

Answers

3

RAID operates below the file system level. So it has no idea that the partitions are "empty". It is simply copying everything from one partition to the other. Assuming that the 3.9 tera-block number is the size of each partition, also the number of blocks in the final array, then you have two partitions of 2 TB each... this means it is copying 2 TB. 2 TB / 3800 minutes comes out to about 9 MB/s. That's not particularly speedy, but given that it's likely a stupid implementation that reads one, writes one, reads one, writes one, etc... it's not terribly out of line for software RAID. It may even be doing a read-after-write check.

Jamie Hanrahan

Posted 2014-10-03T22:40:56.013

Reputation: 19 777

It should has some means to compare the data, isn't it? Why it doesn't read and compare then? – Dims – 2014-10-03T22:52:31.077

1MD is not stupid about how it does the initial sync. On new drives on a completely inactive system. The initial sync on my 3TB drives a few months ago was at 60-70MB/s which is about 1/2 of the theoretical max speed of my drives of 140MB/s. – Zoredache – 2014-10-03T22:57:15.427

Why it doesn't read and compare then? that is basically what it is doing. It just takes a long time. It is likely your drives can only do ~150MB/s at the absolute best. But if you are doing anything else on your system then it won't be able to perform the sync efficiently. – Zoredache – 2014-10-03T23:08:03.687

@Dims If you read and compare, you're doing one operation per block per drive. If you're simply copying, you're doing one operation per block per drive. So I'm not sure what it would accomplish to do a read-and-compare first. If you don't have to write, it takes as long. If you do have to write, it takes longer. – Jamie Hanrahan – 2014-10-03T23:09:21.883

@JamieHanrahan I was assuming reading is faster than writing. Also I was thinking RAID1 has some means to decide faster like CRCs or last modified dates... – Dims – 2014-10-03T23:11:37.673

@Dims Once again, you are thinking of the filesystem level, at the block level that there is no concept of "last modified" – Scott Chamberlain – 2014-10-03T23:16:14.133

@ScottChamberlain so, I can't write the drive until it syncs and can't reboot until it syncs. So, during syncing my system is highly vulnerable. Don't see any reason to use RAID then. – Dims – 2014-10-03T23:18:31.753

@Dims yes, RAID is most vulnerable during a rebuild and software RAID takes longer than hardware raid to rebuild. That is why it is recommend to not use drives that have near manufacturing dates, it makes it very likely that if your first drive fails your second drive will also fail before you finish the rebuild process (which you now see takes two to three days) – Scott Chamberlain – 2014-10-03T23:24:46.807

@ScottChamberlain currently I find this behavior as being irrational. I am not rebuilding anything, I took two empty partitions. – Dims – 2014-10-03T23:26:58.543

@ScottChamberlain btw, if you think it is copying, then from where to where? How it selected source and destination? – Dims – 2014-10-03T23:27:56.607

2When you set up your RAID1, you never said how you did it. You likely told it "use this existing partition and turn it in to RAID1 using this other partition" not "use these two partitions and make a new empty RAID1 device", I bet if you had done it that way it would be fast like you are expecting. Edit in to your original question the steps and commands you did to make the RAID1 device. – Scott Chamberlain – 2014-10-03T23:34:31.340

@Dims Reading is marginally faster in writing, but only during the "seek" part of the operation - and there's darn little seeking when you're just copying great heaping bunches of sequentially-numbered blocks from one place to another. And you can't calculate a CRC without reading the block. Nor can you read the drive's CRC info without reading the block. So you're kind of stuck with one op per block per drive. – Jamie Hanrahan – 2014-10-03T23:53:01.643

2So, during syncing my system is highly vulnerable. Don't see any reason to use RAID then. Why do you think people are working on ZFS, BTRFS, Windows Storage spaces and so on. Most of the common RAID levels are becoming far less useful given our current drives sizes and their error rate characteristics. – Zoredache – 2014-10-04T00:01:12.183

2

then why isn't it possible to say it: relax, everything already identical,

As you will notice in the man page (man mdadm)

--assume-clean Tell mdadm that the array pre-existed and is known to be clean. It can be useful when trying to recover from a major failure as you can be sure that no data will be affected unless you actually write to the array. It can also be used when creating a RAID1 or RAID10 if you want to avoid the initial resync, however this practice - while normally safe - is not recommended. Use this only if you really know what you are doing. When the devices that will be part of a new array were filled with zeros before creation the operator knows the array is actually clean. If that is the case, such as after running badblocks, this argument can be used to tell mdadm the facts the operator knows.

Zoredache

Posted 2014-10-03T22:40:56.013

Reputation: 18 453