I've encountered the same problem. I added a new disk into a multi-device array that was DATA:single, METADATA:raid1, SYSTEM:raid1. The new disk failed about 2 minutes later, leaving me with this:
tassadar@sunfyre:~$ sudo btrfs fi usage /mnt/store
Overall:
Device size: 7.28TiB
Device allocated: 7.14TiB
Device unallocated: 140.98GiB
Device missing: 0.00B
Used: 7.14TiB
Free (estimated): 141.99GiB (min: 71.50GiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 96.00KiB)
Data,single: Size:7.06TiB, Used:7.05TiB
/dev/sdc1 3.53TiB
/dev/sdd1 3.53TiB
missing 2.00GiB
Metadata,RAID1: Size:43.00GiB, Used:41.81GiB
/dev/sdc1 43.00GiB
/dev/sdd1 43.00GiB
System,RAID1: Size:32.00MiB, Used:880.00KiB
/dev/sdc1 32.00MiB
/dev/sdd1 32.00MiB
Unallocated:
/dev/sdc1 70.99GiB
/dev/sdd1 69.99GiB
missing 3.71TiB
It was only mountable with ro,degraded, which is useless when I need to remove the missing device. I couldn't find any way to fix this and the data on those disks wasn't very important, so I started hacking around in the kernel.
Workaround
All of below is obviously very unsafe and blindly copypasting everything might not be the best idea.
These are the changes I made to vanilla 4.7.4 kernel tree, mostly the ancient craft of "commenting out stuff I don't really understand" (with syntax highlighting):
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 864cf3b..bd10a1d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3588,6 +3588,8 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
int num_tolerated_disk_barrier_failures =
(int)fs_info->fs_devices->num_devices;
+ return num_tolerated_disk_barrier_failures;
+
for (i = 0; i < ARRAY_SIZE(types); i++) {
struct btrfs_space_info *tmp;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 589f128..cbcb7b2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2817,7 +2817,8 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans,
}
if (map->stripes[i].dev) {
- ret = btrfs_update_device(trans, map->stripes[i].dev);
+// ret = btrfs_update_device(trans, map->stripes[i].dev);
+ ret = 0;
if (ret) {
mutex_unlock(&fs_devices->device_list_mutex);
btrfs_abort_transaction(trans, root, ret);
@@ -2878,13 +2879,15 @@ static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset)
*/
ASSERT(mutex_is_locked(&root->fs_info->delete_unused_bgs_mutex));
- ret = btrfs_can_relocate(extent_root, chunk_offset);
+// ret = btrfs_can_relocate(extent_root, chunk_offset);
+ ret = 0;
if (ret)
return -ENOSPC;
/* step one, relocate all the extents inside this chunk */
btrfs_scrub_pause(root);
- ret = btrfs_relocate_block_group(extent_root, chunk_offset);
+// ret = btrfs_relocate_block_group(extent_root, chunk_offset);
+ ret = 0;
btrfs_scrub_continue(root);
if (ret)
return ret;
Basically, it does the whole "move extents to another device" part without actually moving the extents -> it just deletes the old ones on the missing drive. It also allows me to mount the fs read-write. Using this "patch" with (other) healthy btrfs systems is unsafe.
Device delete "works" now:
tassadar@sunfyre:~$ sudo mount -o degraded /dev/sdd1 /mnt/store
tassadar@sunfyre:~$ sudo btrfs device delete missing /mnt/store
ERROR: error removing device 'missing': No such file or directory
tassadar@sunfyre:~$ sudo btrfs fi usage /mnt/store
Overall:
Device size: 7.28TiB
Device allocated: 7.14TiB
Device unallocated: 140.98GiB
Device missing: 0.00B
Used: 7.14TiB
Free (estimated): 141.99GiB (min: 71.50GiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 96.00KiB)
Data,single: Size:7.06TiB, Used:7.05TiB
/dev/sdc1 3.53TiB
/dev/sdd1 3.53TiB
Metadata,RAID1: Size:43.00GiB, Used:41.81GiB
/dev/sdc1 43.00GiB
/dev/sdd1 43.00GiB
System,RAID1: Size:32.00MiB, Used:880.00KiB
/dev/sdc1 32.00MiB
/dev/sdd1 32.00MiB
Unallocated:
/dev/sdc1 70.99GiB
/dev/sdd1 69.99GiB
missing 0.00B
tassadar@sunfyre:~$ sudo umount /mnt/store
tassadar@sunfyre:~$ sudo mount /dev/sdd1 /mnt/store
tassadar@sunfyre:~$ sudo btrfs fi usage /mnt/store
Overall:
Device size: 7.28TiB
Device allocated: 7.14TiB
Device unallocated: 140.98GiB
Device missing: 0.00B
Used: 7.14TiB
Free (estimated): 141.99GiB (min: 71.50GiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,single: Size:7.06TiB, Used:7.05TiB
/dev/sdc1 3.53TiB
/dev/sdd1 3.53TiB
Metadata,RAID1: Size:43.00GiB, Used:41.81GiB
/dev/sdc1 43.00GiB
/dev/sdd1 43.00GiB
System,RAID1: Size:32.00MiB, Used:880.00KiB
/dev/sdc1 32.00MiB
/dev/sdd1 32.00MiB
Unallocated:
/dev/sdc1 70.99GiB
/dev/sdd1 69.99GiB
Make sure to revert back to original kernel without the workaround as soon as possible.
Result
My fs seems to be okay now. I might have lost some small amount of data that was on the failed disk, but that's expected when I ran it in "single" mode. I'm currently running btrfs scrub to see if something is horribly broken or not, will edit this post once it finishes.
EDIT: The scrub has finished without any problems, but the fs is still corrupted - when I started deleting some files from it, the kernel found some files that were on the missing drive and threw an error. So, I patched the kernel once more (this time on top of clean 4.7.4, without the previous changes):
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 82b912a..f10b3b6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6853,8 +6853,10 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
ret = update_block_group(trans, root, bytenr, num_bytes, 0);
if (ret) {
- btrfs_abort_transaction(trans, extent_root, ret);
- goto out;
+ btrfs_err(info, "update_block group has failed (%d)", ret);
+ ret = 0;
+ //btrfs_abort_transaction(trans, extent_root, ret);
+ //goto out;
}
}
btrfs_release_path(path);
So yeah, definitely not a good solution, since the FS is quite obviously not okay. But since it is now usable, I didn't really lose anything and this wasn't a high-priority storage, I'm quite content.
I didn't find a solution to this. Instead I copied all the data to a different disk and reformatted. – phiresky – 2016-01-08T16:58:31.267