6

Some system details

  • AMD Phenom II X6 1090T, 16GB DDR3 1600 running 11.04 natty, 2.6.38-8-server
  • Raid in question consists of 5 SATA drives; 4 Samsung, 1 Western Digital; 500 GB each
  • Drives are connected to a LSI SAS 9201-16i Host Bus Adapter Card
  • Raid is software based using mdadm. Two other arrays (/dev/md1, /dev/md2) exist with no issues.

So my raid is toast. At this point I'm pretty much out of my depth, so I hope there's someone here who can point me in some good directions. As I mention below I've been at this 16 hours or so (with a break to clear the mind!) I've been reading everything I can here and elsewhere. Most of the advice is the same, and not encouraging, but I'm hoping to catch the eye of someone considerably more knowledgeable then myself.

So... Yesterday I attempted to add an additional drive to my RAID 5 Array. To do this I powered down the box, inserted the new drive, and re-powered the machine. All good so far.

I then un-mounted the array

% sudo umount /dev/md0

and proceeded to do a file system check.

% sudo e2fsck -f /dev/md0

All well and good.

I created a primary partition on the new drive /dev/sdh1 and set it to type Linux raid autodetect. Wrote to disk and exited.

I added the new drive to the array with

% sudo mdadm --add /dev/md0 /dev/sdh1

and followed it up with

sudo mdadm --grow --raid-devices=5 --backup-file=/home/foundation/grow_md0.bak /dev/md0

(if you're suffused with hope at this point because of the backup, don't be, that file does not exist on my file system, yet I remember typing it, and it's there in my bash history)

Again, all appears to be well. I let this sit while it does its thing. Once it completed, without any errors, I ran e2fsck -f /dev/md0 again. Still nothing out of the ordinary. At this point I felt confident enough to resize it.

% sudo resize2fs /dev/md0

This completed without a peep. For the sake of completeness I shutdown the box and waited for it to come back up.

During boot on the attempt to mount the partition it failed. The assembly of the array worked, seemingly, without a hitch, but mounting it failed with it not able to find an EXT4 file system.

A portion of dmesg follows:

# [    9.237762] md: bind<sdh1>
# [    9.246063] md: bind<sdo>
# [    9.248308] md: bind<sdn>
# [    9.249661] bio: create slab <bio-1> at 1
# [    9.249668] md/raid0:md2: looking at sdn
# [    9.249669] md/raid0:md2:   comparing sdn(1953524992) with sdn(1953524992)
# [    9.249671] md/raid0:md2:   END
# [    9.249672] md/raid0:md2:   ==> UNIQUE
# [    9.249673] md/raid0:md2: 1 zones
# [    9.249674] md/raid0:md2: looking at sdo
# [    9.249675] md/raid0:md2:   comparing sdo(1953524992) with sdn(1953524992)
# [    9.249676] md/raid0:md2:   EQUAL
# [    9.249677] md/raid0:md2: FINAL 1 zones
# [    9.249679] md/raid0:md2: done.
# [    9.249680] md/raid0:md2: md_size is 3907049984 sectors.
# [    9.249681]         md2 configuration          
# [    9.249682] zone0=[sdn/sdo/]
# [    9.249683]         zone offset=0kb device offset=0kb size=1953524992kb
# [    9.249684]                                   
# [    9.249685] 
# [    9.249690] md2: detected capacity change from 0 to 2000409591808
# [    9.250162] sd 2:0:7:0: [sdk] Write Protect is off
# [    9.250164] sd 2:0:7:0: [sdk] Mode Sense: 73 00 00 08
# [    9.250331]  md2: unknown partition table
# [    9.252371] sd 2:0:7:0: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
# [    9.252642] sd 2:0:9:0: [sdm] Write Protect is off
# [    9.252644] sd 2:0:9:0: [sdm] Mode Sense: 73 00 00 08
# [    9.254798] sd 2:0:9:0: [sdm] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
# [    9.256555]  sdg: sdg1
# [    9.261439] sd 2:0:8:0: [sdl] Write Protect is off
# [    9.261441] sd 2:0:8:0: [sdl] Mode Sense: 73 00 00 08
# [    9.263594] sd 2:0:8:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
# [    9.302372]  sdf: sdf1
# [    9.310770] md: bind<sdd1>
# [    9.317153]  sdj: sdj1
# [    9.327325]  sdi: sdi1
# [    9.327686] md: bind<sde1>
# [    9.372897] sd 2:0:3:0: [sdg] Attached SCSI disk
# [    9.391630]  sdm: sdm1
# [    9.397435]  sdk: sdk1
# [    9.400372]  sdl: sdl1
# [    9.424751] sd 2:0:6:0: [sdj] Attached SCSI disk
# [    9.439342] sd 2:0:5:0: [sdi] Attached SCSI disk
# [    9.450533] sd 2:0:2:0: [sdf] Attached SCSI disk
# [    9.464315] md: bind<sdg1>
# [    9.534946] md: bind<sdj1>
# [    9.541004] md: bind<sdf1>

  [    9.542537] md/raid:md0: device sdf1 operational as raid disk 2
  [    9.542538] md/raid:md0: device sdg1 operational as raid disk 3
  [    9.542540] md/raid:md0: device sde1 operational as raid disk 1
  [    9.542541] md/raid:md0: device sdd1 operational as raid disk 0
  [    9.542879] md/raid:md0: allocated 5334kB
  [    9.542918] md/raid:md0: raid level 5 active with 4 out of 5 devices, algorithm 2
  [    9.542923] RAID conf printout:
  [    9.542924]  --- level:5 rd:5 wd:4
  [    9.542925]  disk 0, o:1, dev:sdd1
  [    9.542926]  disk 1, o:1, dev:sde1
  [    9.542927]  disk 2, o:1, dev:sdf1
  [    9.542927]  disk 3, o:1, dev:sdg1
  [    9.542928]  disk 4, o:1, dev:sdh1
  [    9.542944] md0: detected capacity change from 0 to 2000415883264
  [    9.542959] RAID conf printout:
  [    9.542962]  --- level:5 rd:5 wd:4
  [    9.542963]  disk 0, o:1, dev:sdd1
  [    9.542964]  disk 1, o:1, dev:sde1
  [    9.542965]  disk 2, o:1, dev:sdf1
  [    9.542966]  disk 3, o:1, dev:sdg1
  [    9.542967]  disk 4, o:1, dev:sdh1
  [    9.542968] RAID conf printout:
  [    9.542969]  --- level:5 rd:5 wd:4
  [    9.542970]  disk 0, o:1, dev:sdd1
  [    9.542971]  disk 1, o:1, dev:sde1
  [    9.542972]  disk 2, o:1, dev:sdf1
  [    9.542972]  disk 3, o:1, dev:sdg1
  [    9.542973]  disk 4, o:1, dev:sdh1
  [    9.543005] md: recovery of RAID array md0
  [    9.543007] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
  [    9.543008] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
  [    9.543013] md: using 128k window, over a total of 488382784 blocks.
  [    9.543014] md: resuming recovery of md0 from checkpoint.

# [    9.549495] sd 2:0:9:0: [sdm] Attached SCSI disk
# [    9.555022] sd 2:0:8:0: [sdl] Attached SCSI disk
# [    9.555612] sd 2:0:7:0: [sdk] Attached SCSI disk
# [    9.561410] md: bind<sdi1>

  [    9.565538]  md0: unknown partition table

# [    9.639444] md: bind<sdm1>
# [    9.642729] md: bind<sdk1>
# [    9.650048] md: bind<sdl1>
# [    9.652342] md/raid:md1: device sdl1 operational as raid disk 3
# [    9.652343] md/raid:md1: device sdk1 operational as raid disk 2
# [    9.652345] md/raid:md1: device sdm1 operational as raid disk 4
# [    9.652346] md/raid:md1: device sdi1 operational as raid disk 0
# [    9.652347] md/raid:md1: device sdj1 operational as raid disk 1
# [    9.652627] md/raid:md1: allocated 5334kB
# [    9.652654] md/raid:md1: raid level 5 active with 5 out of 5 devices, algorithm 2
# [    9.652655] RAID conf printout:
# [    9.652656]  --- level:5 rd:5 wd:5
# [    9.652657]  disk 0, o:1, dev:sdi1
# [    9.652658]  disk 1, o:1, dev:sdj1
# [    9.652658]  disk 2, o:1, dev:sdk1
# [    9.652659]  disk 3, o:1, dev:sdl1
# [    9.652660]  disk 4, o:1, dev:sdm1
# [    9.652676] md1: detected capacity change from 0 to 3000614518784
# [    9.654507]  md1: unknown partition table
# [   11.093897] vesafb: framebuffer at 0xfd000000, mapped to 0xffffc90014200000, using 1536k, total 1536k
# [   11.093899] vesafb: mode is 1024x768x16, linelength=2048, pages=0
# [   11.093901] vesafb: scrolling: redraw
# [   11.093903] vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
# [   11.094010] Console: switching to colour frame buffer device 128x48
# [   11.206677] fb0: VESA VGA frame buffer device
# [   11.301061] EXT4-fs (sda1): re-mounted. Opts: user_xattr,errors=remount-ro
# [   11.428472] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: user_xattr
# [   11.896204] EXT4-fs (sdc6): mounted filesystem with ordered data mode. Opts: user_xattr
# [   12.262728] r8169 0000:01:00.0: eth0: link up
# [   12.263975] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
# [   13.528097] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: user_xattr
# [   13.681339] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: user_xattr
# [   14.310098] EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: user_xattr
# [   14.357675] EXT4-fs (sdc5): mounted filesystem with ordered data mode. Opts: user_xattr
# [   16.933348] audit_printk_skb: 9 callbacks suppressed
# [   22.350011] eth0: no IPv6 routers present
# [   27.094760] ppdev: user-space parallel port driver
# [   27.168812] kvm: Nested Virtualization enabled
# [   27.168814] kvm: Nested Paging enabled
# [   30.383664] EXT4-fs (sda1): re-mounted. Opts: user_xattr,errors=remount-ro,commit=0
# [   30.385125] EXT4-fs (sdb1): re-mounted. Opts: user_xattr,commit=0
# [   32.105044] EXT4-fs (sdc6): re-mounted. Opts: user_xattr,commit=0
# [   33.078017] EXT4-fs (sdc1): re-mounted. Opts: user_xattr,commit=0
# [   33.079491] EXT4-fs (md2): re-mounted. Opts: user_xattr,commit=0
# [   33.082411] EXT4-fs (md1): re-mounted. Opts: user_xattr,commit=0
# [   35.369796] EXT4-fs (sdc5): re-mounted. Opts: user_xattr,commit=0
# [   35.674390] CE: hpet increased min_delta_ns to 20113 nsec
# [   35.676242] CE: hpet increased min_delta_ns to 30169 nsec
# [   35.677808] CE: hpet increased min_delta_ns to 45253 nsec
# [   35.679349] CE: hpet increased min_delta_ns to 67879 nsec
# [   35.680312] CE: hpet increased min_delta_ns to 101818 nsec
# [   35.680312] CE: hpet increased min_delta_ns to 152727 nsec
# [   35.680312] CE: hpet increased min_delta_ns to 229090 nsec
# [   35.680312] CE: hpet increased min_delta_ns to 343635 nsec
# [   35.681590] CE: hpet increased min_delta_ns to 515452 nsec
# [  436.595366] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: user_xattr
# [  607.364501] exe (14663): /proc/14663/oom_adj is deprecated, please use /proc/14663/oom_score_adj instead.

  [ 2016.476772] EXT4-fs (md0): VFS: Can't find ext4 filesystem
  [ 2246.923154] EXT4-fs (md0): VFS: Can't find ext4 filesystem
  [ 2293.383934] EXT4-fs (md0): VFS: Can't find ext4 filesystem
  [ 2337.292080] EXT4-fs (md0): VFS: Can't find ext4 filesystem
  [ 2364.812150] EXT4-fs (md0): VFS: Can't find ext4 filesystem
  [ 2392.624988] EXT4-fs (md0): VFS: Can't find ext4 filesystem

# [ 3098.003646] CE: hpet increased min_delta_ns to 773178 nsec

  [ 4208.380943] md: md0: recovery done.
  [ 4208.470356] RAID conf printout:
  [ 4208.470363]  --- level:5 rd:5 wd:5
  [ 4208.470369]  disk 0, o:1, dev:sdd1
  [ 4208.470374]  disk 1, o:1, dev:sde1
  [ 4208.470378]  disk 2, o:1, dev:sdf1
  [ 4208.470382]  disk 3, o:1, dev:sdg1
  [ 4208.470385]  disk 4, o:1, dev:sdh1
  [ 7982.600595] EXT4-fs (md0): VFS: Can't find ext4 filesystem

During startup it asked me what I wanted to do about it. I told it to move on and started dealing with it once the machine was back up. The first thing I did was check /proc/mdstat...

# Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
# md1 : active raid5 sdl1[3] sdk1[2] sdm1[4] sdi1[0] sdj1[1]
#       2930287616 blocks level 5, 128k chunk, algorithm 2 [5/5] [UUUUU]
#       
# md2 : active raid0 sdn[0] sdo[1]
#       1953524992 blocks 64k chunks

  md0 : active raid5 sdf1[2] sdg1[3] sde1[1] sdd1[0] sdh1[5]
        1953531136 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
#       
# unused devices: <none>

...and /etc/mdadm/mdadm.conf:

  ARRAY /dev/md0 level=raid5 num-devices=5 UUID=98941898:e5652fdb:c82496ec:0ebe2003
# ARRAY /dev/md1 level=raid5 num-devices=5 UUID=67d5a3ed:f2890ea4:004365b1:3a430a78
# ARRAY /dev/md2 level=raid0 num-devices=2 UUID=d1ea9162:cb637b4b:004365b1:3a430a78

Then I checked fdisk:

foundation@foundation:~$ sudo fdisk -l /dev/sd[defgh]

Disk /dev/sdd: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000821e5

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sde: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00004a72

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sdf: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000443c2

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sdg: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000e428

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sdh: 500.1 GB, 500107862016 bytes
81 heads, 63 sectors/track, 191411 cylinders
Units = cylinders of 5103 * 512 = 2612736 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x8c4d0ecf

   Device Boot      Start         End      Blocks   Id  System
/dev/sdh1               1      191412   488385560   fd  Linux raid autodetect

Everything seemed to be in order so I checked the details of the array and examined it's constituents.

foundation@foundation:~$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Fri May 13 00:57:15 2011
     Raid Level : raid5
     Array Size : 1953531136 (1863.03 GiB 2000.42 GB)
  Used Dev Size : 488382784 (465.76 GiB 500.10 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

    Update Time : Fri May 13 04:43:10 2011
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : foundation:0  (local to host foundation)
           UUID : a81ad850:3ce5e5a5:38de6ac7:9699b3dd
         Events : 32

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       8       65        1      active sync   /dev/sde1
       2       8       81        2      active sync   /dev/sdf1
       3       8       97        3      active sync   /dev/sdg1
       5       8      113        4      active sync   /dev/sdh1



foundation@foundation:~$ sudo mdadm --examine /dev/sd[defgh]1

/dev/sdd1: (samsung)
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : a81ad850:3ce5e5a5:38de6ac7:9699b3dd
           Name : foundation:0  (local to host foundation)
  Creation Time : Fri May 13 00:57:15 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 976765954 (465.76 GiB 500.10 GB)
     Array Size : 3907062272 (1863.03 GiB 2000.42 GB)
  Used Dev Size : 976765568 (465.76 GiB 500.10 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 6e6422de:f39c618a:2cab1161:b36c8341

    Update Time : Fri May 13 15:53:06 2011
       Checksum : 679bf575 - correct
         Events : 32

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing)


/dev/sde1: (samsung)
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : a81ad850:3ce5e5a5:38de6ac7:9699b3dd
           Name : foundation:0  (local to host foundation)
  Creation Time : Fri May 13 00:57:15 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 976765954 (465.76 GiB 500.10 GB)
     Array Size : 3907062272 (1863.03 GiB 2000.42 GB)
  Used Dev Size : 976765568 (465.76 GiB 500.10 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : bd02892c:a346ec88:7ffcf757:c18eee12

    Update Time : Fri May 13 15:53:06 2011
       Checksum : 7cdeb0d5 - correct
         Events : 32

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AAAAA ('A' == active, '.' == missing)


/dev/sdf1: (samsung)
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : a81ad850:3ce5e5a5:38de6ac7:9699b3dd
           Name : foundation:0  (local to host foundation)
  Creation Time : Fri May 13 00:57:15 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 976765954 (465.76 GiB 500.10 GB)
     Array Size : 3907062272 (1863.03 GiB 2000.42 GB)
  Used Dev Size : 976765568 (465.76 GiB 500.10 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : acd3d576:54c09121:0636980e:0a490f59

    Update Time : Fri May 13 15:53:06 2011
       Checksum : 5c91ef46 - correct
         Events : 32

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : AAAAA ('A' == active, '.' == missing)


/dev/sdg1: (samsung)
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : a81ad850:3ce5e5a5:38de6ac7:9699b3dd
           Name : foundation:0  (local to host foundation)
  Creation Time : Fri May 13 00:57:15 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 976765954 (465.76 GiB 500.10 GB)
     Array Size : 3907062272 (1863.03 GiB 2000.42 GB)
  Used Dev Size : 976765568 (465.76 GiB 500.10 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 5f923d06:993ac9f3:a41ffcde:73876130

    Update Time : Fri May 13 15:53:06 2011
       Checksum : 65e75047 - correct
         Events : 32

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : AAAAA ('A' == active, '.' == missing)


/dev/sdh1:  (western digital)
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : a81ad850:3ce5e5a5:38de6ac7:9699b3dd
           Name : foundation:0  (local to host foundation)
  Creation Time : Fri May 13 00:57:15 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 976769072 (465.76 GiB 500.11 GB)
     Array Size : 3907062272 (1863.03 GiB 2000.42 GB)
  Used Dev Size : 976765568 (465.76 GiB 500.10 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 622c546d:41fe9683:42ecf909:cebcf6a4

    Update Time : Fri May 13 15:53:06 2011
       Checksum : fc5ebc1a - correct
         Events : 32

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 4
   Array State : AAAAA ('A' == active, '.' == missing)

I tried to mount it myself:

foundation@foundation:~$ sudo mount -t ext4 -o defaults,rw /dev/md0 mnt
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

No go. So at this point I started trying to do some of the things a variety of posts here and elsewhere suggested. The first thing was just doing an e2fsck.

foundation@foundation:~$ sudo e2fsck -f /dev/md0
e2fsck 1.41.14 (22-Dec-2010)
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/md0

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

As the above advice echoes what I was reading I gave it a try.

foundation@foundation:~$ sudo mke2fs -n /dev/md0
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=16 blocks, Stripe width=64 blocks
122101760 inodes, 488382784 blocks
24419139 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
14905 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
        102400000, 214990848

foundation@foundation:~$ sudo e2fsck -fb 32768 /dev/md0
e2fsck 1.41.14 (22-Dec-2010)
e2fsck: Bad magic number in super-block while trying to open /dev/md0

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

And repeat for each reported superblock backup. No ahah. I also tried mounting it while pointing at a different section as some sites suggested...

$ sudo mount -t ext4 -o sb=( (4096 / 1024) * 32768 ),ro /dev/md0 mnt

foundation@foundation:~$ sudo mount -t ext4 -o sb=131072,ro /dev/md0 mnt     
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

At this point I came across a couple posts mentioning testdisk from Christophe Grenier of CGSecurity and gave it a go. Seemed promising at the beginning but it too could not find anyting useful. It couldn't even see any files. (Though photorec did, at which point I did do a 'recover' only to have it segfault about a 5th of the way through.)

Fustrated I walked away after setting testdisk to do a deep search and got some sleep. This morning I got this:

TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org

Disk /dev/md0 - 2000 GB / 1863 GiB - CHS 488382784 2 4
        Partition                Start              End        Size in sectors
 1 D Linux                    98212   0  1    73360627   1  4    586099328
 2 D Linux                    98990   0  1    73361405   1  4    586099328
 3 D Linux                    99006   0  1    73361421   1  4    586099328
 4 D Linux                    99057   0  1    73361472   1  4    586099328
 5 D Linux                    99120   0  1    73361535   1  4    586099328
 6 D Linux                182535942   0  1   426669713   1  4   1953070176
 7 D Linux                182536009   0  1   426669780   1  4   1953070176
 8 D Linux                182536470   0  1   426670241   1  4   1953070176
 9 D Linux                182538637   0  1   426672408   1  4   1953070176
10 D Linux                204799120   0  1   326894735   1  4    976764928

Structure: Ok.  Use Up/Down Arrow keys to select partition.
Use Left/Right Arrow keys to CHANGE partition characteristics:
*=Primary bootable  P=Primary  L=Logical  E=Extended  D=Deleted
Keys A: add partition, L: load backup, T: change type, P: list files,
     Enter: to continue

The first few times I used testdisk I did not let it complete a full search so I never saw this more complete list before.

The numbers to the left of the partition listings are mine for the purposes of identification. At the bottom of the application was a kind of descriptive message that changed when I focused on a particular partition. The lines are listed below with numbers matching the partitions.

1, 2, 3, 4, 5
EXT3 Large file Sparse superblock Recover, 300 GB / 279 GiB

6, 7, 8, 9
EXT4 Large file Sparse superblock Recover, 999 GB / 931 GiB

10
EXT3 Large file Sparse superblock Backup superblock, 500 GB / 465 GiB

Even armed with an even more detailed partition list didn't make a difference. It could not read any files from the partitions listed.

That's where I'm at now. I have no other avenues to take that I can find. The data on here is important as it's my fiancées photography drive. I know a raid is not a backup, as is plain now, but I didn't think I could afford a backup solution that could encompass a Tera-byte or two. That's what I'll be looking into the next few days. In the meantime I hope you guys forgive the skyscraper of text and can help me find a solution.

Ohh one more thing... The configuration of that last partition list in testdisk seems awfully suspicious to me. It's a set of five possible partitions early on followed by another set of five possible partitions. Matching the number of devices in the array. Perhaps it's a clue.

2 Answers2

4

I'm thinking you jumped the gun and resized the fs before it was finished. From the dmesg output you can see that it had to finish the reshape after you rebooted:

[    9.542918] md/raid:md0: raid level 5 active with 4 out of 5 devices, algorithm 2 
[    9.543005] md: recovery of RAID array md0
[    9.543007] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[    9.543008] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[    9.543013] md: using 128k window, over a total of 488382784 blocks.
[    9.543014] md: resuming recovery of md0 from checkpoint.
[ 4208.380943] md: md0: recovery done.

Edit: the data is most likely gone.

Mark Wagner
  • 17,764
  • 2
  • 30
  • 47
  • So strange. I would have sworn `/proc/mdstat` reported that it was finished before I performed the resize. Bother. Foolish brain and its knack for confabulation! – lose_the_grimm May 14 '11 at 03:55
1

You might want to take a look at ddrescue now to see if anything at all can be recovered...

http://www.gnu.org/software/ddrescue/ddrescue.html

Brad
  • 279
  • 2
  • 10
  • I'll give it a go and see what it can salvage. =) – lose_the_grimm May 18 '11 at 14:12
  • you might be able to get photorec to work better on the output of ddrescue -- its a long shot, but photorec might have been segfaulting due to problems which may not be there if you work on the output of ddrescue instead... goodl luck! – Brad May 18 '11 at 14:43
  • Sorry for coming back to this so late. I did end up using photorec on the output of ddrescue and it worked well enough to get much of the contents back. Thanks for the suggestion Brad! – lose_the_grimm Oct 16 '12 at 14:08