I've been fighting this issue for some time now.

I have a Logical Volume with 3 disks, 1.5TB, 2TB and 3TB. The 1.5TB drive is failing. Lots of I/O errors and dead bad sectors. I started pvmove to move the existing extents on the failing drive to the 3TB drive (there's enough space left). I moved 99% of the extents but the last percent seems to be impossible to read. Reading fails and pvmove exits.

Here's the current state:

root@server:~# pvdisplay 
/dev/sdd: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error
/dev/sdd: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK.
--- Physical volume ---
PV Name               /dev/sda # old, working drive
VG Name               lvm_group1
PV Size               1.82 TiB / not usable 1.09 MiB
Allocatable           yes (but full)
PE Size               4.00 MiB
Total PE              476932
Free PE               0
Allocated PE          476932
PV UUID               FEoDYU-Lhjf-FdI1-Ei5p-koue-PIma-TGvs9A

--- Physical volume ---
PV Name               /dev/sdd1  # old failing drive
VG Name               lvm_group1
PV Size               1.36 TiB / not usable 2.40 MiB
Allocatable           NO
PE Size               4.00 MiB
Total PE              357699
Free PE               357600
Allocated PE          99
PV UUID               hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK

--- Physical volume ---
PV Name               /dev/sdf # new drive
VG Name               lvm_group1
PV Size               2.73 TiB / not usable 4.46 MiB
Allocatable           yes 
PE Size               4.00 MiB
Total PE              715396
Free PE               357746
Allocated PE          357650
PV UUID               qs4BVK-PAPv-I1DG-x5wJ-dRNq-vhBE-wQeJL6

Here's what pvmove is saying:

root@server:~# pvmove /dev/sdd1:335950-336500 /dev/sdf --verbose
Finding volume group "lvm_group1"
Archiving volume group "lvm_group1" metadata (seqno 93).
Creating logical volume pvmove0
Moving 50 extents of logical volume lvm_group1/cryptex
Found volume group "lvm_group1"
activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/cryptex
Updating volume group metadata
Found volume group "lvm_group1"
Found volume group "lvm_group1"
Creating lvm_group1-pvmove0
Loading lvm_group1-pvmove0 table (253:2)
Loading lvm_group1-cryptex table (253:0)
Suspending lvm_group1-cryptex (253:0) with device flush
Suspending lvm_group1-pvmove0 (253:2) with device flush
Found volume group "lvm_group1"
activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/pvmove0
Resuming lvm_group1-pvmove0 (253:2)
Found volume group "lvm_group1"
Loading lvm_group1-pvmove0 table (253:2)
Suppressed lvm_group1-pvmove0 identical table reload.
Resuming lvm_group1-cryptex (253:0)
Creating volume group backup "/etc/lvm/backup/lvm_group1" (seqno 94).
Checking progress before waiting every 15 seconds
/dev/sdd1: Moved: 4.0%
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
No physical volume label read from /dev/sdd1
Physical volume /dev/sdd1 not found
ABORTING: Can't reread PV /dev/sdd1
ABORTING: Can't reread VG for /dev/sdd1

There's only 99 extents still left on the failing drive. I'm OK with losing this data - I just want to pull this drive and throw it away without losing data on other drives.

So I tried pvremove:

root@server:~# pvremove /dev/sdd1
/dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
No physical volume label read from /dev/sdd1
Physical Volume /dev/sdd1 not found

And then vgreduce:

root@server:~# vgreduce lvm_group1  --removemissing
/dev/sdd: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error
/dev/sdd: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK.
WARNING: Partial LV cryptex needs to be repaired or removed. 
WARNING: Partial LV pvmove0 needs to be repaired or removed. 
There are still partial LVs in VG lvm_group1.
To remove them unconditionally use: vgreduce --removemissing --force.
Proceeding to remove empty missing PVs.

pvdisplay is still showing the failing drive...

Any ideas?

  • 121
  • 1
  • 1
  • 6

2 Answers2


In the end I solved this problem by manually editing the /etc/lvm/backup/lvm_group1.

Here are the steps in case anyone else hits this problem:

  1. I physically removed the dead drive from the server
  2. I executed vgreduce lvm_group1 --removemissing --force
  3. I removed from the config the dead drive
  4. I added another stripe on a "good" drive in place of the extents that were unreadable on the dead drive.
  5. I executed vgcfgrestore -f edited_config_file.cfg lvm_group1
  6. Reboot
  7. Voila! Drive is visible and can be mounted.

It just took me 4 days of learning in-and-outs of LVM to solve this...

So far it looks good. No errors. Happy camping.

  • 121
  • 1
  • 1
  • 6
  • 1
    do you think this could work in this case: I have 2 HDs, on same LVM group; but only have logical volume on 1st HD, the second I use for snapshots; if the second fails, there will have any problems as a missing group member? – Aquarius Power Feb 20 '15 at 22:48
  • @AquariusPower it should work in your case as well. The entire difficulty is in editing of the config file. If you can do it right then it will work for you. – Sniku Feb 22 '15 at 21:06
  • 1
    ohh... reading it again, it seems you broke the 1st LVM law that says: "if there is one missing PV, your LV is lost", COOL! :D (of course you did loose some data, but the point was restoring the most of it); also, I believe, if you use that contiguous flag somewhere, no files may split between PVs; that config file doesnt seem too difficult to edit manually, I wonder if there is some "lvm config file fixer" somewhere out there that could do the same you did? :) – Aquarius Power Feb 22 '15 at 21:59
  • 1
    I think in the title could be added something like: ... and recovering partial data from an incomplete LV (with a missing PV) – Aquarius Power Feb 22 '15 at 22:01
  • 1
    @AquariusPower I don't think there is any automated config fixer. There isn't even much information on doing this manually. I lost couple of files, but the rest of the data was preserved. – Sniku Feb 23 '15 at 22:19
  • I think, just for fun, I may try to do it someday, some script that make the calculations and provide partial-auto-recovery :) – Aquarius Power Feb 23 '15 at 22:58
  • if you still have the old lvm_group1 configuration, and if you dont mind it, it would be cool to paste them here (or at least a `diff`), so we have more clues when trying it :) – Aquarius Power Feb 23 '15 at 22:59
  • oh, btw, I just found something that may be interesting/related: http://bisqwit.iki.fi/source/lvm2defrag.html, from here http://unix.stackexchange.com/a/45918/30352, it makes calculations based on current lvm layout – Aquarius Power Feb 24 '15 at 02:57
  • 1
    holy SHit you have no idea how happy i am to find this answer – markasoftware Apr 23 '20 at 20:11

If you are ok to stop the LVM temporarily (and to close underlying LUKS containers if used) an alternative solution it to copy as much as possible of the PVs (or the underlying LUKS containers) to the good disk with GNU ddrescue and to remove the old disk before restarting the LVM.

While I like Sniku's LVM solution, ddrescue may be able to recover more data than pvmove.

(The reason for stopping the LVM is that LVM has multipath support and would balance write operations between the pairs of PVs with identical UUIDs as soon as LVM discovers them. Furthermore, one should stop LVM and LUKS to ensure that all data that has recently been written is visible on the underlying devices. A restart of the system and not supplying the LUKS passwords is the easiest way to make sure of it.)