Error on rebuilding Linux raid-1


I have a Linux box which acts as a home NAS with 2 x 1 TB HDDs in Linux RAID-1. Recently one of two drives failed, so I bought a new one (1TB WD Blue) and put it on.

Rebuilding starts and stops at 7.8% giving an error that /dev/sdd (the good drive) has a bad block and the process cannot continue anymore. I tried to remove/add the new drive, but the process always stops at the same point. The good news is that I can still have access to my data which is mounted at /storage (xfs fs). Below I give more information about the problem:

The good (source) disk:

sudo fdisk -l /dev/sdd

WARNING: GPT (GUID Partition Table) detected on '/dev/sdd'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1              63  1953525167   976762552+  da  Non-FS data

The new (destination) hard disk:

sudo fdisk -l /dev/sdc

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
81 heads, 63 sectors/track, 382818 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x5c5d0188

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048  1953525167   976761560   da  Non-FS data

The RAID-1 array:

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md3 : active raid1 sdc1[3] sdd1[2]
      976761382 blocks super 1.2 [2/1] [U_]
      [=>...................]  recovery =  7.7% (75738048/976761382) finish=601104.0min speed=24K/sec

dmesg (this message is repeated many times):

[35085.217154] ata10.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x0
[35085.217160] ata10.00: irq_stat 0x40000008
[35085.217163] ata10.00: failed command: READ FPDMA QUEUED
[35085.217170] ata10.00: cmd 60/08:08:37:52:43/00:00:6d:00:00/40 tag 1 ncq 4096 in
[35085.217170]          res 41/40:00:3c:52:43/00:00:6d:00:00/40 Emask 0x409 (media error) <F>
[35085.217173] ata10.00: status: { DRDY ERR }
[35085.217175] ata10.00: error: { UNC }
[35085.221619] ata10.00: configured for UDMA/133
[35085.221636] sd 9:0:0:0: [sdd] Unhandled sense code
[35085.221639] sd 9:0:0:0: [sdd]
[35085.221641] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[35085.221643] sd 9:0:0:0: [sdd]
[35085.221645] Sense Key : Medium Error [current] [descriptor]
[35085.221649] Descriptor sense data with sense descriptors (in hex):
[35085.221651]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[35085.221661]         6d 43 52 3c
[35085.221666] sd 9:0:0:0: [sdd]
[35085.221669] Add. Sense: Unrecovered read error - auto reallocate failed
[35085.221671] sd 9:0:0:0: [sdd] CDB:
[35085.221673] Read(10): 28 00 6d 43 52 37 00 00 08 00
[35085.221682] end_request: I/O error, dev sdd, sector 1833128508
[35085.221706] ata10: EH complete

mdadm detail:

sudo mdadm --detail /dev/md3
        Version : 1.2
  Creation Time : Fri Apr 13 19:10:18 2012
     Raid Level : raid1
     Array Size : 976761382 (931.51 GiB 1000.20 GB)
  Used Dev Size : 976761382 (931.51 GiB 1000.20 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed Sep  4 08:57:46 2013
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 7% complete

           Name : hypervisor:3  (local to host hypervisor)
           UUID : b758f8f1:a6a6862e:83133e3a:3b9830ea
         Events : 1257158

    Number   Major   Minor   RaidDevice State
       2       8       49        0      active sync   /dev/sdd1
       3       8       33        1      spare rebuilding   /dev/sdc1

One thing that I noticed is that the source hard disk (/dev/sdd) has a partition which starts at 63 sector where the new disk (/dev/sdc) starts at sector 2048. Does this have to do with the problem? Is there a way to tell mdadm to ignore this bad block and continue the array rebuilding?

I was thinking as last resort to clone the source (/dev/sdd) drive to the new drive (/dev/sdc) by using ddrescue (live CD) and then put that one as the source disk. Will this work?

I have repartitioned both /dev/sdd and /sdc. Now it looks like this:

sudo fdisk -l -u /dev/sdc

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
81 heads, 63 sectors/track, 382818 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0002c2de

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048  1953525167   976761560   da  Non-FS data

sudo fdisk -l -u /dev/sdd

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
23 heads, 12 sectors/track, 7077989 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytess
Disk identifier: 0x00069b7e

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048  1953525167   976761560   da  Non-FS data

Is this OK?

I rebuilt the array again and then restored all data from backup. Everything looks OK except that on reboot /dev/md3 is renamed to /dev/md127.

    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md127 : active raid1 sdd1[0] sdc1[2]
      976630336 blocks super 1.2 [2/2] [UU]

md1 : active raid0 sdb5[0] sda5[1]
      7809024 blocks super 1.2 512k chunks

md2 : active raid0 sdb6[0] sda6[1]
      273512448 blocks super 1.2 512k chunks

md0 : active raid1 sdb1[0] sda1[2]
      15623096 blocks super 1.2 [2/2] [UU]

cat /etc/mdadm/mdadm.conf
ARRAY /dev/md/0 metadata=1.2 UUID=5c541476:4ee0d591:615c1e5a:d58bc3f7 name=hypervisor:0
ARRAY /dev/md/1 metadata=1.2 UUID=446ba1de:407f8ef4:5bf728ff:84e223db name=hypervisor:1
ARRAY /dev/md/2 metadata=1.2 UUID=b91cba71:3377feb4:8a57c958:11cc3df0 name=hypervisor:2
ARRAY /dev/md/3 metadata=1.2 UUID=5c573747:61c40d46:f5981a8b:e818a297 name=hypervisor:3

sudo mdadm --examine --scan --verbose
ARRAY /dev/md/0 level=raid1 metadata=1.2 num-devices=2 UUID=5c541476:4ee0d591:615c1e5a:d58bc3f7 name=hypervisor:0
ARRAY /dev/md/1 level=raid0 metadata=1.2 num-devices=2 UUID=446ba1de:407f8ef4:5bf728ff:84e223db name=hypervisor:1
ARRAY /dev/md/2 level=raid0 metadata=1.2 num-devices=2 UUID=b91cba71:3377feb4:8a57c958:11cc3df0 name=hypervisor:2
ARRAY /dev/md/3 level=raid1 metadata=1.2 num-devices=2 UUID=5c573747:61c40d46:f5981a8b:e818a297 name=hypervisor:3

cat /etc/fstab
# /etc/fstab: static file system information.
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    nodev,noexec,nosuid 0       0
# / was on /dev/md0 during installation
UUID=2e4543d3-22aa-45e1-8adb-f95cfe57a697 /               ext4    noatime,errors=remount-ro,discard   0       1
#was /dev/md3 before
UUID=13689e0b-052f-48f7-bf1f-ad857364c0d6      /storage     ext4     defaults       0       2
# /vm was on /dev/md2 during installation
UUID=9fb85fbf-31f9-43ff-9a43-3ebef9d37ee8 /vm             ext4    noatime,errors=remount-ro,discard   0       2
# swap was on /dev/md1 during installation
UUID=1815549c-9047-464e-96a0-fe836fa80cfd none            swap    sw

Any suggestion on this?


Posted 2013-09-04T06:08:38.223


2Your "good" drive is actually bad. – Michael Hampton – 2013-09-04T06:36:44.807



The good news are that I can still have access to my data which are mounted at /storage

No, you can't; you have a problem reading the data at those dodgy blocks on /dev/sdd. You just don't know that in ordinary operation, either because you don't happen to read those blocks, or your application is tolerant of read errors.

I find messages like those that /dev/sdd is logging to be extremely worrying. If it were my device, I'd back the data up as fast as possible, preferably twice, replace the other drive as well drive, and restore from such a backup as I'd been able to get.

In addition, as you point out, you're trying to mirror a 976762552 block partition with a 976761560 block one, and that won't work; the new partition needs to be at least as big as the old one. I'm slightly surprised that mdadm allowed reconstruction to proceed, but you don't say what distro you're running, so it's hard to know how old the version is; perhaps it's old enough not to check that sort of thing.

Edit: Yes, you should enlarge the partition as you describe. I'm not an ubuntu fan, so I can't comment on that version. If you get this resync done, I'd replace the other drive immediately. If you have a decent backup, I'd stop wasting time with the resync, replace it now, recreate the array, and restore from backups.


Posted 2013-09-04T06:08:38.223

Reputation: 381

I have Ubuntu 12.04 x64:

uname -a
Linux hypervisor 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I have already backed up all data.What I need is to get rebuilding process completed. As I said one of two hard disks is brand new.When I fdisk (ed) this disk fdisk automatically arranged the first sector at 2048 (instead of 63).Should I manually delete the partition and put 63 instead of 2048? – None – 2013-09-04T06:28:04.157


You may try the procedure I have described here: Remake SW RAID1 from a new HDD and an old HDD with bad blocks. It uses hdparm to read and write bad sectors and so to remap them on the disk if possible.


Posted 2013-09-04T06:08:38.223



sdd drive is definitely failed and out of internal reallocation space.

Anyway, you can try to update firmware if available.

BTW, these are GPT disks, use parted or gdisk for listing and manipulating partitions. fdisk doesn't support GPT and globally is very buggy app.


Posted 2013-09-04T06:08:38.223

Reputation: 111