28

I'm a longtime user, and first time question submitter. I've spent a full day searching this and many other sites for a solution, but I must resort to requesting assistance to resolve my situation.

History: Our HP Proliant, Centos 5.9 server was powered off yesterday without proper shutdown procedures. From that point forward, the /home partition has been in a state where we are unable to fsck it, mount it, or umount it. umount states that it's not mounted, yet mount/fsck indicate that it is busy or already mounted. This originally caused the server to not boot. We eventually removed the disk/partition from /etc/fstab so that bootup would not fail.

# mount -t ext3 /dev/cciss/c0d0p1 /home
mount: /dev/cciss/c0d0p1 already mounted or /home busy

# fsck /dev/cciss/c0d0p1 
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
fsck.ext3: Device or resource busy while trying to open /dev/cciss/c0d0p1
Filesystem mounted or opened exclusively by another program?

As you can see, the disk is not mounted in any way.

df output:

# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d1p3    198381228  24920704 163220696  14% /
/dev/cciss/c0d1p2    267818128    191652 253802544   1% /logs
/dev/cciss/c0d1p1       194442     33575    150828  19% /boot
tmpfs                 49495044         0  49495044   0% /dev/shm

mount output:

# mount
/dev/cciss/c0d1p3 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/cciss/c0d1p2 on /logs type ext3 (rw)
/dev/cciss/c0d1p1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

/etc/fstab

# cat /etc/fstab
LABEL=/                 /                       ext3    defaults        1 1
LABEL=/logs             /logs                   ext3    defaults        1 2
LABEL=/boot             /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=SW-cciss/c0d1p5   swap                    swap    defaults        0 0

/etc/mtab

# cat /etc/mtab 
/dev/cciss/c0d1p3 / ext3 rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/cciss/c0d1p2 /logs ext3 rw 0 0
/dev/cciss/c0d1p1 /boot ext3 rw 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0

/proc/mounts

# cat /proc/mounts 
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,data=ordered 0 0
/dev /dev tmpfs rw 0 0
/proc /proc proc rw 0 0
/sys /sys sysfs rw 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
devpts /dev/pts devpts rw 0 0
/dev/cciss/c0d1p2 /logs ext3 rw,data=ordered 0 0
/dev/cciss/c0d1p1 /boot ext3 rw,data=ordered 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
/etc/auto.misc /misc autofs rw,fd=7,pgrp=9694,timeout=300,minproto=5,maxproto=5,indirect 0 0
-hosts /net autofs rw,fd=13,pgrp=9694,timeout=300,minproto=5,maxproto=5,indirect 0 0

lsof

# lsof /dev/cciss/c0d0p1 
#

fuser

# fuser /dev/cciss/c0d0p1  
#

fdisk -l

# fdisk -l /dev/cciss/c0d0

Disk /dev/cciss/c0d0: 1800.2 GB, 1800280694784 bytes
255 heads, 63 sectors/track, 218871 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d0p1   *           1      218871  1758081276   83  Linux

Per other recommendations on the web, we've used the ILO3 Remote Terminal to boot off of a Centos LiveCD. When we did this, we were able to mount/unmount, fsck, the partition without any errors or problems. (ie: the disk itself is fine).

We also used "debugfs" to perform an inode clear for the Journal Inode <8>. fsck then re-built the journal without error. Again, we were able to mount/unmount the disk without any problems when booted into the LiveCD.

When we switched back to the normal boot partition, we're back in the same place, unable to mount or fsck due to the OS believing that the partition is busy.

I'm looking to understand what else within Linux could be indicating that this disk is in use. What other utilities can be used to find this and clear it?

Any help is greatly appreciated.


Additional information, as requested:

lsof and fuser on /home, along with showing /home contents and directory permissions.

# lsof /home
# fuser /home
# ls -la /home
total 16
drwxr-xr-x  2 root root 4096 Mar 15  2013 .
drwxr-xr-x 27 root root 4096 Nov 19 08:31 ..
# ls -l / | grep home
drwxr-xr-x   2 root root  4096 Mar 15  2013 home
#

The mount -o remount fails, since this partition has not been mounted since most recent boot. (This was a working partition since the server was installed, and only showed this problem after the hard reboot yesterday).

# mount -o remount -t ext3 /dev/cciss/c0d0p1 /home
mount: /home not mounted already, or bad option

I could re-add this partition to /etc/fstab, and reboot if needed.


2013/11/19 11:12am CST

dmsetup output:

# dmsetup table                
mpath0: 0 3516173232 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 104:0 1000 
mpath0p1: 0 3516162552 linear 253:0 63

# dmsetup info
Name:              mpath0
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      1
Major, minor:      253, 0
Number of targets: 1
UUID: mpath-3600508b1001cb6e6453d25c4052abca5

Name:              mpath0p1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 1
Number of targets: 1
UUID: part1-mpath-3600508b1001cb6e6453d25c4052abca5

lsof -n

# lsof -n | grep /home
#

Final Solution:

# multipath -ll
mpath0 (3600508b1001cb6e6453d25c4052abca5) dm-0 HP,LOGICAL VOLUME
[size=1.6T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
 \_ #:#:#:# cciss!c0d0 104:0  [active][ready]

# multipath -F

# multipath -ll
#

# mount -t ext3 /dev/cciss/c0d0p1 /home
# cat /proc/mounts | grep home
/dev/cciss/c0d0p1 /home ext3 rw,data=ordered 0 0
TripSixes
  • 383
  • 1
  • 3
  • 6
  • 5
    Excellent example of a good first time question. – TheCleaner Nov 19 '13 at 15:43
  • 2
    Good question! It's a *really* long shot, but have you considered trying `mount -o remount /home`? Also, I would check to make sure /home is actually empty when the file system is not mounted (that *should* not be a problem, but who knows?) and look for any applicable messages in the system logs, including `dmesg`. – user Nov 19 '13 at 15:48
  • 2
    You do an lsof of the disk. Did you try doing an lsof/fuser of /home as well, in case anything is running that affects the mount point? – Jenny D Nov 19 '13 at 15:48
  • 1
    By the way, are you by any chance exporting that directory, e.g. via nfs? If the nfs-server is started before mounting the directory, it could block you mounting it. – Jenny D Nov 19 '13 at 15:50
  • Added information showing output from the recommended commands. Also, to Jenny D, this partition is not involved in any sort of NFS, it's a local partition only. – TripSixes Nov 19 '13 at 16:02
  • 1
    What is the output of `lsof -n | grep /home` ? AFAIK lsof /home looks for a process with /home open, but doesn't report usage of subdirectories. – Zoredache Nov 19 '13 at 16:30
  • @Zoredache That command also results in nothing found, no output from the command. – TripSixes Nov 19 '13 at 17:09
  • @Zoredache That's true; still, since all home directories should be on the disk that's not possible to mount, there shouldn't *be* any subdirectories in it. But then again, "shouldn't" doesn't necessarily mean "isn't", so your way is better. – Jenny D Nov 20 '13 at 12:52

5 Answers5

16

It's probably in use by device-mapper.

Check your device-mapper table using dmsetup table. If it's in there, clear the mapping with dmsetup remove <name>.

If not, look for errors in dmesg as well.


# dmsetup table
mpath0: 0 3516173232 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 104:0 1000 
mpath0p1: 0 3516162552 linear 253:0 63

Ahah! multipath has claimed the disk. You can see by running multipath -ll.

Run: multipath -F to flush all unused maps then multipath -ll should output nothing.

Or, just use /dev/mapper/mpath0p1 instead of /dev/cciss/c0d0p1.

MikeyB
  • 38,725
  • 10
  • 102
  • 186
  • I'm not sure what this output is telling me. Does this show the output you were expecting? [_My carriage return does not appear to work in this comment box.._] `# dmsetup table mpath0: 0 3516173232 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 104:0 1000 mpath0p1: 0 3516162552 linear 253:0 63` – TripSixes Nov 19 '13 at 17:03
1

In relation to the troubleshooting process, when trying lsof or fuser, don't only check against the relevant partition - check against the disk directly. This quickly would have directed you to the correct solution:


Bad:

fuser /dev/cciss/c0d0p1

Good:

fuser /dev/cciss/c0d0

Bad:

lsof /dev/c0d0p1

Good:

lsof /dev/ | grep c0d0
MadHatter
  • 78,442
  • 20
  • 178
  • 229
zaTricky
  • 537
  • 4
  • 13
1

I just encountered this after cloning an existing SAN LUN to a new server. My solution was:

  • Enter maintenance mode
  • mount -o remount,rw /dev/sda1 (where sda1 is whatever you're having issue with)
  • Delete/move /etc/blkid/blkid.tab

The server booted afterward.

theillien
  • 425
  • 3
  • 10
  • 26
0

In my experience, it is caused by docker's containerd service. One can make the device free with systemctl restart containerd, but I am not sure why and how containerd opens that device.

gholk
  • 11
  • 1
  • This was in fact a good hint; but in my case it was one of docker containers, which held the descriptor implicitly. So check what's running with `docker ps` and then inspect volumes with `docker inspect -f '{{ .Mounts }}' ...`. – Alex Offshore Aug 19 '21 at 14:42
-2

Speaking from my own experiences. Check your fstab as well, to ensure you're trying to mount the device as its logical volume and not the alias you gave or are using off /etc/multipath/bindings.

Rat
  • 1