44

There is a particular directory (/var/www), that when I run ls (with or without some options), the command hangs and never completes. There is only about 10-15 files and directories in /var/www. Mostly just text files. Here is some investigative info:

[me@server www]$ df .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_dev-lv_root
                       50G   19G   29G  40% /

[me@server www]$ df -i .
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/vg_dev-lv_root
                        3.2M    435K    2.8M   14% /

find works fine. Also I can type in cd /var/www/ and press TAB before pressing enter and it will successfully tab-completion list of all files/directories in there:

[me@server www]$ cd /var/www/
cgi-bin/         create_vhost.sh  html/            manual/          phpMyAdmin/      scripts/         usage/
conf/            error/           icons/           mediawiki/       rackspace        sqlbuddy/        vhosts/
[me@server www]$ cd /var/www/

I have had to kill my terminal sessions several times because of the ls hanging:

[me@server ~]$ ps | grep ls
gdm       6215  0.0  0.0 488152  2488 ?        S<sl Jan18   0:00 /usr/bin/pulseaudio --start --log-target=syslog
root     23269  0.0  0.0 117724  1088 ?        D    18:24   0:00 ls -Fh --color=always -l
root     23477  0.0  0.0 117724  1088 ?        D    18:34   0:00 ls -Fh --color=always -l
root     23579  0.0  0.0 115592   820 ?        D    18:36   0:00 ls -Fh --color=always
root     23634  0.0  0.0 115592   816 ?        D    18:38   0:00 ls -Fh --color=always
root     23740  0.0  0.0 117724  1088 ?        D    18:40   0:00 ls -Fh --color=always -l
me       23770  0.0  0.0 103156   816 pts/6    S+   18:41   0:00 grep ls

kill doesn't seem to have any affect on the processes, even as sudo.

What else should I do to investigate this problem? It just randomly started happening today.

UPDATE

dmesg is a big list of things, mostly related to an external USB HDD that I've mounted too many times and the max mount count has been reached, but that is an un-related problem I think. Near the bottom of dmesg I'm seeing this:

INFO: task ls:23579 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ls            D ffff88041fc230c0     0 23579  23505 0x00000080
 ffff8801688a1bb8 0000000000000086 0000000000000000 ffffffff8119d279
 ffff880406d0ea20 ffff88007e2c2268 ffff880071fe80c8 00000003ae82967a
 ffff880407169ad8 ffff8801688a1fd8 0000000000010518 ffff880407169ad8
Call Trace:
 [<ffffffff8119d279>] ? __find_get_block+0xa9/0x200
 [<ffffffff814c97ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814c964b>] mutex_lock+0x2b/0x50
 [<ffffffff8117a4d3>] do_lookup+0xd3/0x220
 [<ffffffff8117b145>] __link_path_walk+0x6f5/0x1040
 [<ffffffff8117a47d>] ? do_lookup+0x7d/0x220
 [<ffffffff8117bd1a>] path_walk+0x6a/0xe0
 [<ffffffff8117beeb>] do_path_lookup+0x5b/0xa0
 [<ffffffff8117cb57>] user_path_at+0x57/0xa0
 [<ffffffff81178986>] ? generic_readlink+0x76/0xc0
 [<ffffffff8117cb62>] ? user_path_at+0x62/0xa0
 [<ffffffff81171d3c>] vfs_fstatat+0x3c/0x80
 [<ffffffff81258ae5>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff81171eab>] vfs_stat+0x1b/0x20
 [<ffffffff81171ed4>] sys_newstat+0x24/0x50
 [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b

And also, strace ls /var/www/ spits out a whole BUNCH of information. I don't know what is useful here... The last handful of lines:

ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=68, ws_col=145, ws_xpixel=0, ws_ypixel=0}) = 0
stat("/var/www/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/var/www/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fcntl(3, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
getdents(3, /* 16 entries */, 32768)    = 488
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3093b18000
write(1, "cgi-bin  conf  create_vhost.sh\te"..., 125cgi-bin  conf  create_vhost.sh      error  html  icons  manual  mediawiki  phpMyAdmin  rackspace  scripts  sqlbuddy  usage   vhosts
) = 125
close(1)                                = 0
munmap(0x7f3093b18000, 4096)            = 0
close(2)                                = 0
exit_group(0)                           = ?
Jake Wilson
  • 8,494
  • 29
  • 94
  • 121

8 Answers8

29

Run strace ls /var/www/ and see what it hangs on. It's certainly hung on I/O -- that's what the D state in your ps output means (and since kill doesn't help, it's one of the uninterruptible I/O syscalls). Most hangs involve an NFS server that's gone to god, but based on your df that isn't the case here. A quick check of dmesg for anything related to filesystems or disks might be worthwhile, just in case.

womble
  • 95,029
  • 29
  • 173
  • 228
  • 2
    NFS still might be the case. If `ls` is aliased to something that tries to dereference symlinks to find what theyre pointing at, it could be hanging if the symlink points to a dead NFS mount. – phemmer Mar 08 '12 at 00:20
  • Gah, didn't notice it was a `df .` and not a full `df`. It could definitely be an NFS problem then. – womble Mar 08 '12 at 06:40
  • There are no NFS mounts here. It's all the local single disk. It's a very simple linux server. One physical drive. – Jake Wilson Mar 08 '12 at 15:46
  • `strace ls /var/www/` prints out a bunch of stuff. What do I look for? The last line is `exit_group(0) = ?`. – Jake Wilson Mar 08 '12 at 15:47
  • See updated question. – Jake Wilson Mar 08 '12 at 16:03
  • 3
    @Jakobud Try `strace -vf ls -l /var/www` to see if it stops at a specific file or dir. – ott-- Mar 08 '12 at 18:03
  • It doesn't appear to. It looks like it lists each file and then ends with the same 4 lines as I listed above (slightly different munmap hex number or address or whatever that is). – Jake Wilson Mar 08 '12 at 18:54
  • @womble any more ideas on this given the updated `dmesg` and `strace` information above? I clearly see some kind of errors in `dmesg` but I'm not sure what to do to fix the problem. – Jake Wilson Mar 08 '12 at 18:58
4

On the hope this will be helpful, I had the above symptoms being caused by using docker and docker compose with the AUFS driver in Ubuntu 14.04. ls <dir> was hanging, and strace ls <dir> showed it was hanging on the getdents call. Stopping all running containers allowed me to begin using the drive as expected.

Hamy
  • 367
  • 3
  • 11
  • I have the same problem. Doing a `docker system prune -a` had no effect. I literally started from scratch and it still broke. I removed the affected folder from my host machine and created it again on the host machine and from the container by running: `mkdir ` and it still broke both times. Running an strace next. – Biketire Aug 26 '21 at 09:28
  • @Biketire were you ever able to figure out a solution to this? Am getting the exact issue with docker on mac. – Sarosh Khatana Feb 08 '22 at 05:22
  • @SaroshKhatana no. I think what I did was switching from the Docker app to docker-machine and that worked out okay. It's a hassle to setup though. – Biketire Feb 09 '22 at 15:43
  • I managed to solve my issue. I'm on mac and apparently docker's mounted volumes on mac do not work well with underscores in folder names. my folder was functional_tests and that was breaking it. Changed to functionaltests and works fine now. – Sarosh Khatana Feb 10 '22 at 13:08
3

I had a problem with the same symptoms. It turned out that I had a symlink in that directory to an SMB mount over GVFS.

lrwxrwxrwx  1 alex alex        45 Sep 16  2011 foo -> /home/alex/.gvfs/bar on foo/data/

Normally ls would complete instantly whether or not the share was mounted. But in this case I had suspended and resumed the machine, and the mount was performing poorly in general. Remounting the share fixed the problem.

z0r
  • 165
  • 2
  • 9
2

I was experiencing the same problem.

Entering a directory is fine, listing it hangs, find works, tab complete hangs, and some folders beneath do work. Very head-scratchingly-weird.

Reading this thread on Server Fault did lead me on a logic path towards the solution.

It being to do with NAS, and NAS commonly being put as `automount' made me realise that I had recently changed my fstab to 'automount' some usb drives if they were present but carry on as normal when they weren't.

I then proceeded as follows:

  1. Unmount the partition containing the delinquent directory.
  2. Edit fstab and convert all automount to either commented out or without auto.
  3. Reload SystemD if you have it: systemctl --system daemon-reload
  4. mount -a

Try entering the directory again and get that warm fuzzy feeling of having fixed the issue.

Aethalides
  • 139
  • 4
1

This happened to me. The cause ultimately was due to an sshfs mount point in the directory where the SSH server had become unreachable. strace did not give me any clue that ls was hanging on that entry (or maybe I don't understand how to read strace output).

I managed identify the cause by:

  • Noticing that my usual ls alias hanged but not running /bin/ls directly.
  • I then was to iteratively bisect the directory listing with different glob expressions to narrow down which entry was problematic. For example:
    $ ls -d [a-lA-L]* # OK
    $ ls -d [m-zM-Z]* # Hangs
    $ ls -d [m-sM-S]* # Hangs
    ...
    

Curiously, in my case, /bin/ls -F worked but /bin/ls --color did not. (I don't understand why, but that probably deserves its own question.)

jamesdlin
  • 113
  • 6
  • sshfs has a separate question on SuperUser: [ssh - How to avoid sshfs freezing? - Super User](https://superuser.com/questions/443878/how-to-avoid-sshfs-freezing) refer to that for information. – user202729 May 12 '22 at 01:56
1

Womble's suggestions are excellent, and you should try those first, but if they don't fix it I have had this problem when a filesystem has become self-inconsistent (through flaky hardware, obscure kernel bugs, or even cosmic rays).

If you think it might be that, you can force a fsck on reboot by doing touch /forcefsck; reboot. Watch what it says at boot time, to see if the fsck picks up any inconsistencies.

Warning: this will fsck all the filesystems attached to the machine; do not do it if you also have a multi-petabyte disc array attached, it may take days. fscking filesystems can also lead to data loss; if you really do have inconsistencies in your file system, e2fsck will change it from one that looks right but doesn't quite work, to one that works right but may not contain everything you expect.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
1

I had the same exact symptoms that you described. To fix the problem all I had to do was fix the DNS server addresses. We had moved the NAS to a new network, which required updating the DNS server addresses. The addresses were statically assigned, but in the QNAP web interface I updated it to automatically assign.

Nick
  • 11
  • 1
-2

Running strace ls /var/www/ will give you hind of what is wrong. I had similar issue for / dir and using strace I was able to locate it was a NAS mount which caused it. Unmounting that NAS fixed the issue.