2

I have a Sun T5220 server with the onboard LSI card and two disks that were in a RAID 1 mirror. The data is not important right now but we had a failed disk and are trying to understand how to do this for real if we had to recover from a failure.

The initial situation looked like this:

# raidctl -l c1t0d0
Volume                  Size    Stripe  Status   Cache  RAID
         Sub                     Size                    Level
                 Disk
----------------------------------------------------------------
c1t0d0                  136.6G  N/A     DEGRADED OFF    RAID1
                 0.1.0   136.6G          GOOD
                 N/A     136.6G          FAILED

Green light on the 0.0.0 disk. Find / lights up the 0.1.0 disk. So I know I have a bad drive and which one it is. Server still boots obviously.

First, we tried putting a new disk in. This disk came from an unknown source. Format would not see it, cfgadm -al would not see it so raidctl -l would not see it. I figure it's bad. We tried another disk from another spare server:

# raidctl -c c1t1d0 c1t0d0  (where t1 is my good disk - 0.1.0)
Disk has occupied space.

Also the different syntax options don't change anything:

# raidctl -C "0.1.0 0.0.0" -r 1 1
Disk has occupied space.

# raidctl -C "0.1.0 0.0.0" 1
Disk has occupied space.

Ok. Maybe this is because the disk from the spare server had a RAID 1 on it already. Aha, I can see another volume in raidctl:

# raidctl -l
Controller: 1
         Volume:c1t1d0  (this is my server's root mirror)
         Volume:c1t132d0  (this is the foreign root mirror)
         Disk: 0.0.0
         Disk: 0.1.0
         ...

No problem. I don't care about the data, I'll just delete the foreign mirror.

# raidctl -d c1t132d0
(warning about data deletion but it works)

At this point, /usr/bin/ binaries freak out. By that I mean, ls -l /usr/bin/which shows 1.4k but cat /usr/bin/which gives me a newline. Great, I just blew away the binaries (ie: binaries in mem still work)? I bounce the box. It all comes back fine. WTF. Anyway, back to recreating my mirror.

# raidctl -l
Controller: 1
         Volume:c1t1d0  (this is my server's root mirror)
         Disk: 0.0.0
         Disk: 0.1.0
         ...

Man says that you can delete a mirror and it will split it. Ok, I'll delete the root mirror.

# raidctl -d c1t0d0
Array in use.  (this might not be the exact error)

I googled this and found of course you can't do this (even with -f) while booted off the mirror. Ok. I boot cdrom -s and deleted the volume.

Now I have one disk that has a type of "LSI-Logical-Volume" on c1t1d0 (where my data is) and a brand new "Hitachi 146GB" on c1t0d0 (what I'm trying to mirror to):

(booted off the CD)
# raidctl -c c1t1d0 c1t0d0 (man says it's source destination for mirroring)
Illegal Array Layout.

# raidctl -C "0.1.0 0.0.0" -r 1 1  (alt syntax per man)
Illegal Array Layout.

# raidctl -C "0.1.0 0.0.0" 1  (assumes raid1, no help)
Illegal Array Layout.

Same size disks, same manufacturer but I did delete the volume instead of throwing in a blank disk and waiting for it to resync. Maybe this was a critical error. I tried selecting the type in format for my good disk to be a plain 146gb disk but it resets the partition table which I'm pretty sure would wipe the data (bad if this was production).

Am I boned? Anyone have experience with breaking and resyncing a mirror? There's nothing on Google about "Illegal Array Layout" so here's my contrib to the search gods.

squarism
  • 199
  • 1
  • 9

1 Answers1

2

As it turns out, I could not find an answer to this. But I did find a workaround and some good information. First off, this was on 10/08 (U6) of Solaris 10. I booted off a 10/09 boot CD (U8) and found that there's a bug in raidctl on U8. U8 fails with a "Corrupt labe - bad geometry" error. Even when I wiped the disks completely I was unable to re-create a mirror using the U8 boot CD but on U7 (and presumably U6) the exact same command worked. So just a bit of version warning there.

The gist of the workaround went something like this (substitute your disks, paths, etc).

  • My partitions were split but I could see the data off the boot CD. I needed a lot of space to do a ufsdump so I imported a large zfs pool. This could mean different things to you, maybe just mount a large partition. Let's call it /mnt/space.
  • Copy or dump each of your existing partition table to a file. Remember you're booted off the CD at this point.
    • format (1, p, p) or do a prtvtoc /dev/rdsk/c1t0d0s2 > /mnt/space/partitions.txt
  • Backup partitions
    • ufsdump 0f - /dev/rdsk/c1t0d0s0 > /mnt/space/root_c1t1d0s0.dmp
    • ufsdump 0f - /dev/rdsk/c1t1d0s4 > /mnt/space/var_c1t1d0s4.dmp
    • continue for each partition
  • Recreate raid (off U7 or older CD, U8 has a bug, fails).
    • raidctl -c c1t0d0 c1t1d0 WARNING: This wipes both drives.
    • Creating RAID volume will destroy all data on spare space of member disks, proce ed (yes/no)? yes
  • Label new raid disk with format. You should not receive weird or failed labeling errors in format.
  • Lookup your volume with raidctl -l (assuming it's c1t1d0 in these instructions)
  • Restore your partition layout.
    • cat /mnt/space/partitions.txt | fmthard -s - /dev/rdsk/c1t1d0s2
  • At this point I actually switched to DiskSuite but the restore steps are similar.
  • newfs each of the partitions.
    • newfs /dev/rdsk/c1t1d0s0 (through s7, skip s2 obviously)
  • Mount and restore each partition:
    • mkdir /tmp/s0
    • mount /dev/rdsk/ctt1d0s0 /tmp/s0
    • cd /tmp/s0
    • cat /mnt/space/root_c1t1d0s0.dmp | ufsrestore xf - (answer yes to root dir permission)
    • umount /tmp/s0
    • repeat for each slice
  • Copy the boot block. The path to this file depends on your hardware:
    • installboot platform/SUNW,SPARC-Enterprise-T5220/lib/fs/ufs/bootblk /dev/rdsk/c1t1d0s0
  • Unmount everything, export zpools if need be, reboot off the CD.
  • Edit your alias from the open boot prompt
    • probe-scei-all
    • show-disks (select disk)
    • nvalias disk Ctrl-Y
    • boot disk
  • At this point you should be back to a hardware mirror or perhaps you switched to DiskSuite.

If you want to switch to disksuite in the middle of all this: - backup using ufsdump as above - delete your hardware raid definition - restore partitions to a 1st disk, newfs the partitions on the 1st disk - ufsrestore to the 1st disk, do a sanity boot - then start the regular disksuite install. If you try to do disksuite off the CD all in one-shot, it won't take because the meta service isn't running. You'll get this error: metadb: network/rpc/meta:default: failed to enable/disable SVM service Doing the meta commands won't hurt, they just won't stick. IE: when you reboot off your harddisk, metastat will says "no meta databases found".

squarism
  • 199
  • 1
  • 9