6

Setup: 12x 1TB drives in a RAID6 (MDADM) crypt-setup running ontop of MDADM LVM running on the crypted drives EXT4 on the LVM

Background: I added a new drive to the RAID (increasing from 11 to 12 drives), and 'bubbled' up through the layers (MDADM, etc...) to reizing the ext4 partition. This machine is used as a centralized repository for photography and as a backup server (for both Windows and Mac machines) so bringing it down to add the drive and wait for the resizing and everything wasn't really an option. So I started the resize operation several days ago. HTOP is reporting the resize2fs operation as running for 81h now. DMESG and syslog are both clear, and the drives are still accessable. The resize command reports it's started an online resize of the partition, so the process IS running, and it is burning through 100% of one of my cores.

Question: Is it normal for the operation to take this long or has something gone horribly wrong? Where would I start looking for signs of trouble?

Adam
  • 63
  • 1
  • 3
  • Keep waiting. It'll be done sometime around the next 2-300 years. – Tom O'Connor Oct 09 '12 at 21:41
  • For what it's worth, this is the output I see in the console and the command I used to start resize2fs: resize2fs /dev/store resize2fs 1.42 (29-Nov-2011) Filesystem at /dev/store is mounted on /mnt/store; on-line resizing required old_desc_blocks = 524, new_desc_blocks = 583 Performing an on-line resize of /dev/store to 2441697280 (4k) blocks. – Adam Oct 10 '12 at 21:01
  • Just to say, I'm very surprised that *growing* the partition took so long for you. Shrinking I could understand, since that can involve shuffling data around... – mwfearnley Apr 20 '20 at 09:46

2 Answers2

11

If you ran resize2fs with the -p option, it would print out regular progress reports. However, since you didn't, there doesn't seem to be any way to get that information while it's running.

This related question seems to indicate that it is fairly normal for resize2fs to run for a very long time.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • Oh, how I wished I used the -p option. It's at 121h compute time and counting. I read that answer, but I'm finding it hard to believe...121h and counting is the better part of a week. Oh well, as long as it's not destroying my data I suppose. I will update this post when (if) it finishes. – Adam Oct 10 '12 at 20:55
  • 4
    It took about 125 hours but it finally completed... – Adam Oct 11 '12 at 03:28
0

edit/added: as others stated below don´t try this!


I don´t know if resize2fs supports it but you could try to send an USR1 signal

killall -USR1 resize2fs

What I´m wondering: how long did your mdadm-update take? This should have taken significantly longer than the ext-resize from my point of view.

Edit: Might it be, that in background your RAID is still migrated (md is still heavily running) and that the resize process is blocked/slowed down because of this?

cljk
  • 225
  • 1
  • 10
  • 9
    One could send it a USR1 signal and find out that it doesn't know what to do with the signal and aborts, leaving the filesystem in an unusable state. I wouldn't recommend this without being 100% certain it won't cause data loss. – Michael Hampton Oct 09 '12 at 19:45
  • the MDADM-update took about 20 hours. My RAID is not sycing (according to /proc/mdstat) and, as of Monday (2 days ago) I killed anything that could be writing to the array. resize2fs is still taking 100% of a single core, 121h in. – Adam Oct 10 '12 at 20:59
  • 2
    I have just tried to create a test partition, then grow it online and in the middle of the process, send USR1 signal to resize2fs - it said "User defined signal 1" and exited. FWIW it seems to exit gracefully with the volume partially resized but I wouldn't rely on this if there is no backup. – bbonev Mar 01 '16 at 10:48
  • 1
    DONT DO THIS. My resize2fs closed when it got this signal. – isaaclw Dec 09 '19 at 13:23