1

After upgrading a rhel 5 server today I rebooted to the new kernel: curr=2.6.18-371.el5PAE prev=2.6.18-348.18.1.el5PAE.

In the boot sequence I saw a message indicating Logical Volume Management was starting and then almost immediately I saw this and was offered a rescue shell:

Found duplicate PV BPF...ayV: using /dev/sdc1 not /dev/md3.

Note: /dev/sdc1 and /dev/sdb1 are members of the raid1 array /dev/md3.

From this, I assumed that the lvm2 software thinks that /dev/sdc1 and /dev/md3 are pv's with the same UUID and that the lvm2 software was choosing to ignore /dev/md3 and using /dev/sdc1.

I powered down and unplugged the drive for sdc and restarted. Unexpectedly, the system booted without me noticing any problem. Of course, md3 was degraded.

I powered down, plugged in the drive that I unplugged, rebooted, and the system started again without me noticing any problem. Of course, md3 was still degraded but something unexpected happened.

The filesystem within the troubled logical volume was mounted.

I executed pvdisplay and saw the same above error. Of course, when I tried to add sdc1 back into md3 it wouldn't let me because it was in use by the lvm2 software.

I unmounted the filesystem and ran e2fsck on the lv device path. No problems there (but there should have been problems).

There are really four related questions (sorry). Assuming 3's answer is 'yes or sorta', then the answer to 4 is what I need. I asked the first two because I'm assuming that I need to understand their answers to make sense of any answer to the last two.

  1. Why is the filesystem ok if the logical volume was originally made up of a pv of /dev/md3 instead of /dev/sdc1?

  2. Shouldn't the /dev/sdc1 be different from /dev/md3 so as to prevent the logical volume from being consistent with respect to the physical volumes within in? This might be answered by question 1.

  3. Can I fix my problem by removing the pv information from /dev/sdc1 and adding /dev/sdc1 back into /dev/md3?

  4. If the answer to #3 is yes, then how do I go about it without trashing the logical volume and its filesystem?

Some history:

I've never executed "pvcreate /dev/sdc1" so I have no idea why this should be happening. It is true, however, that /dev/sdc has been troubling me lately in that smartmon (sp?) will tell me it can't read the smart data or it can't even see the device. I'll fix the problem either by (a) rebooting, (b) reboot+bios hang+power down+reset sata cable+power on, or sequence b but replace the sata cable instead of just reseating it.

Jeff Holt
  • 11
  • 1
  • 3

2 Answers2

2
  1. I'm not sure you asked the question you think you asked, but /dev/md3 is the same as /dev/sdb1 and /dev/sdc1 since it's a mirror set.

  2. No, it shouldn't.

  3. No, that will create data loss for you.

  4. N/A

You can probably get rid of this error message by modifying your /etc/lvm.conf file to change the filter to reject sdb* and scd* devices, regenrate your initrd, then reboot.

John
  • 8,920
  • 1
  • 28
  • 34
  • I understand that /dev/sdb1 and /dev/sdc1 must be equal when the raid1 array is active and not degraded (as long as md is bug-free). But diff /dev/md3 /dev/sdb1 exits 2 with a message saying it found a binary difference. So, when you said "is the same", what did you mean by the word "same"? Because I'm certainly not agreeing if you mean "each byte equal at the same relative offset". And that was the whole point of my question of trying to check a filesystem with /dev/md3 versus checking it with /dev/sdb1 (or whatever is ok in the raid1 array). – Jeff Holt Oct 03 '13 at 20:03
  • Oh, I see it now. In q1, the question should have ended "...instead of /dev/sdc1"? But my reply question still stands. – Jeff Holt Oct 03 '13 at 20:11
  • The md3 will have differences insofar as it is the actual mirror device, whereas /dev/sdb1 and /dev/sdc1 are simply one side of the mirror. Ultimately, though, you want to fsck against /dev/md3, not against either of the constituent devices. – John Oct 04 '13 at 11:38
  • Yes, which is why I was questioning the validity of allowing the lvm system to consider using /dev/sdc1 instead of /dev/md3. It doesn't make sense that it should be allowed to do that. And it doesn't make sense that e2fsck /dev/sdc1 should produce the same results as e2fsck /dev/md3. – Jeff Holt Oct 04 '13 at 15:19
1

The fundamental problem is that the array was created with an MD superblock at the end, which means the superblocks at the start are still recognisable at their expected offsets. The only thing that prevents the PV superblock from being parsed is that the MD subsystem grabs the devices first; usually. Sometimes upper layers take care to yield when another superblock is also detectable, but that can be fragile.

There are two ways to avoid this.

  • Create the array with --metadata=1.2, which is the default since 2010. The PV superblock will be shifted by 512k, and won't be recognisable on unassembled devices
  • Use LVM's MD integration. Specify --type=raidXX to lvcreate or lvconvert. LVM doesn't expose unassembled devices.

Normally these precautions are taken at creation time, but in your case (raid1 with metadata at the end, containing a PV), you can convert to LVM-integrated MD without too much trouble.

Once you've made sure the array is synced and the filesystem mostly sane, you can disassemble it, nuke the raid superblocks on both disks (read the wipefs manpage carefully, you don't want to nuke PV superblocks by mistake), nuke the PV superblock on one member only, extend your VG onto that, and lvconvert your logical volumes to --type=raid1 --mirrors=1. Finally, re-run grub-install onto both disks.

Gabriel
  • 261
  • 2
  • 4