4

Had a power failure and now my mdadm array is having problems.

sudo mdadm -D /dev/md0 [hodge@hodge-fs ~]$ sudo mdadm -D /dev/md0 /dev/md0: Version : 0.90 Creation Time : Sun Apr 25 01:39:25 2010 Raid Level : raid5 Array Size : 8790815232 (8383.57 GiB 9001.79 GB) Used Dev Size : 1465135872 (1397.26 GiB 1500.30 GB) Raid Devices : 7 Total Devices : 7 Preferred Minor : 0 Persistence : Superblock is persistent

    Update Time : Sat Aug  7 19:10:28 2010
          State : clean, degraded, recovering
 Active Devices : 6
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 128K

 Rebuild Status : 10% complete

           UUID : 44a8f730:b9bea6ea:3a28392c:12b22235 (local to host hodge-fs)
         Events : 0.1307608

    Number   Major   Minor   RaidDevice State
       0       8       81        0      active sync   /dev/sdf1
       1       8       97        1      active sync   /dev/sdg1
       2       8      113        2      active sync   /dev/sdh1
       3       8       65        3      active sync   /dev/sde1
       4       8       49        4      active sync   /dev/sdd1
       7       8       33        5      spare rebuilding   /dev/sdc1
       6       8       16        6      active sync   /dev/sdb

sudo mount -a

[hodge@hodge-fs ~]$ sudo mount -a
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

sudo fsck.ext4 /dev/md0

[hodge@hodge-fs ~]$ sudo fsck.ext4 /dev/md0
e2fsck 1.41.12 (17-May-2010)
fsck.ext4: Group descriptors look bad... trying backup blocks...
/dev/md0: recovering journal
fsck.ext4: unable to set superblock flags on /dev/md0

sudo dumpe2fs /dev/md0 | grep -i superblock

[hodge@hodge-fs ~]$ sudo dumpe2fs /dev/md0 | grep -i superblock
dumpe2fs 1.41.12 (17-May-2010)
  Primary superblock at 0, Group descriptors at 1-524
  Backup superblock at 32768, Group descriptors at 32769-33292
  Backup superblock at 98304, Group descriptors at 98305-98828
  Backup superblock at 163840, Group descriptors at 163841-164364
  Backup superblock at 229376, Group descriptors at 229377-229900
  Backup superblock at 294912, Group descriptors at 294913-295436
  Backup superblock at 819200, Group descriptors at 819201-819724
  Backup superblock at 884736, Group descriptors at 884737-885260
  Backup superblock at 1605632, Group descriptors at 1605633-1606156
  Backup superblock at 2654208, Group descriptors at 2654209-2654732
  Backup superblock at 4096000, Group descriptors at 4096001-4096524
  Backup superblock at 7962624, Group descriptors at 7962625-7963148
  Backup superblock at 11239424, Group descriptors at 11239425-11239948
  Backup superblock at 20480000, Group descriptors at 20480001-20480524
  Backup superblock at 23887872, Group descriptors at 23887873-23888396
  Backup superblock at 71663616, Group descriptors at 71663617-71664140
  Backup superblock at 78675968, Group descriptors at 78675969-78676492
  Backup superblock at 102400000, Group descriptors at 102400001-102400524
  Backup superblock at 214990848, Group descriptors at 214990849-214991372
  Backup superblock at 512000000, Group descriptors at 512000001-512000524
  Backup superblock at 550731776, Group descriptors at 550731777-550732300
  Backup superblock at 644972544, Group descriptors at 644972545-644973068
  Backup superblock at 1934917632, Group descriptors at 1934917633-1934918156

sudo e2fsck -b 32768 /dev/md0

[hodge@hodge-fs ~]$ sudo e2fsck -b 32768 /dev/md0
e2fsck 1.41.12 (17-May-2010)
/dev/md0: recovering journal
e2fsck: unable to set superblock flags on /dev/md0

sudo dmesg | tail

[hodge@hodge-fs ~]$ sudo dmesg | tail
EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (59837!=29115)
EXT4-fs (md0): group descriptors corrupted!
EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (59837!=29115)
EXT4-fs (md0): group descriptors corrupted!

Please Help!!!

2 Answers2

1

From your description and the errors, it looks to me as though there are some serious data corruption issues. Remember, RAID protects against a very specific issue; limited disk failure. A power outage isn't protected against; that's why you use UPSes and keep backups as well as using RAID.

The one thing that looks odd to me is the inclusion of /dev/sdb instead of /dev/sdb1 in the list of RAID devices. Is that correct, or did the last character get cut off perhaps?

I would try the remaining backup superblocks, just in case.

Other than that, you might look for disk recovery software. Ideally you'll be able to take a backup image of the current state of the disks; that will reduce the chance that further changes will damage the data irreparably.

Slartibartfast
  • 3,265
  • 17
  • 16
  • Yes I agree :( - I tired all the superblocks but none worked - I tried using TestDisk - it can see the partition but most of the data on it is bad. I can see my folders but can't open them. - Cry:( – Matthew Hodgkins Aug 08 '10 at 00:39
  • Yeah, sdb instead of sdb1 looks odd, you're right. @MatthewHodgkins, how is that? – poige Apr 15 '12 at 04:42
0

You RAID setup had several flaws:

  1. RAID-5 with number of disks ≥ 3—4 is rather fragile. One disk's being kicked out and your data are in trouble.
  2. Not using write-intent bitmaps is dangerous and makes item #1 only worse.
  3. Spare could be more reasonable used as primary for RAID-6 or RAID-10 instead…

(I could add small chunk size and NOT using LVM-2 as disadvantages too, but they do not strongly affects overall status, of course.)

Now — never do anything to the array (fsck, and so on) until it's fully repaired. And I would strongly recommend you not trying to recover the data by yourself. You'd better find a specialist instead (in case you value them, of course).

poige
  • 9,171
  • 2
  • 24
  • 50