5

System is CentOS5 x86_64, completely up to date.

I've got a folder that can't be listed (ls just hangs, eating memory until it is killed). The directory size is nearly 500k:

root@server [/home/user/public_html/domain.com/wp-content/uploads/2010/03]# stat .
  File: `.'
  Size: 458752          Blocks: 904        IO Block: 4096   directory
Device: 812h/2066d      Inode: 44499071    Links: 2
Access: (0755/drwxr-xr-x)  Uid: ( 3292/ user)   Gid: ( 3287/ user)
Access: 2012-06-29 17:31:47.000000000 -0400
Modify: 2012-10-23 14:41:58.000000000 -0400
Change: 2012-10-23 14:41:58.000000000 -0400

I can see the file names if I use ls -1f, but it just repeats the same 48 files ad infinitum, all of which have non-ascii characters somewhere in the file name:

La-critic\363-al-servicio-la-privacidad-300x160.jpg

When I try to access the files (say to copy them or remove them) I get messages like the following:

lstat("/home/user/public_html/domain.com/wp-content/uploads/2010/03/Sebast\355an-Pi\361era-el-balc\363n-150x120.jpg", 0x7fff364c52c0) = -1 ENOENT (No such file or directory)

I tried altering the code found on this man page and modified the code to call unlink for each file. I get the same ENOENT error from the unlink call:

unlink("/home/user/public_html/domain.com/wp-content/uploads/2010/03/Marca-naci\363n-Madrid-150x120.jpg") = -1 ENOENT (No such file or directory)

I also straced a "touch", grabbed the syscalls it makes and replicated them, then tried to unlink the resulting file by name. This works fine, but the folder still contains an entry by the same name after the operation completes and the program runs for an arbitrarily long time (strace output ended up at 20GB after 5 minutes and I stopped the process).

I'm stumped on this one, I'd really prefer not to have to take this production machine (hundreds of customers) offline to fsck the filesystem, but I'm leaning toward that being the only option at this point. If anyone's had success using other methods for removing files (by inode number, I can get those with the getdents code) I'd love to hear them.

(Yes, I've tried find . -inum <inode> -exec rm -fv {} \; and it still has the problem with unlink returning ENOENT)

For those interested, here's the diff between that man page's code and mine. I didn't bother with error checking on mallocs, etc because I'm lazy and this is a one-off:

root@server [~]# diff -u listdir-orig.c listdir.c 
--- listdir-orig.c      2012-10-23 15:10:02.000000000 -0400
+++ listdir.c   2012-10-23 14:59:47.000000000 -0400
@@ -6,6 +6,7 @@
 #include <stdlib.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
+#include <string.h>

 #define handle_error(msg) \
        do { perror(msg); exit(EXIT_FAILURE); } while (0)
@@ -17,7 +18,7 @@
    char           d_name[];
 };

-#define BUF_SIZE 1024
+#define BUF_SIZE 1024*1024*5

 int main(int argc, char *argv[])
 {
@@ -26,11 +27,16 @@
    struct linux_dirent *d;
    int bpos;
    char d_type;
+   int deleted;
+   int file_descriptor;

    fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY);
    if (fd == -1)
        handle_error("open");

+   char* full_path;
+   char* fd_path;
+
    for ( ; ; ) {
        nread = syscall(SYS_getdents, fd, buf, BUF_SIZE);
        if (nread == -1)
@@ -55,7 +61,24 @@
           printf("%4d %10lld  %s\n", d->d_reclen,
                   (long long) d->d_off, (char *) d->d_name);
           bpos += d->d_reclen;
+          if ( d_type == DT_REG )
+          {
+              full_path = malloc(strlen((char *) d->d_name) + strlen(argv[1]) + 2); //One for the /, one for the \0
+              strcpy(full_path, argv[1]);
+              strcat(full_path, (char *) d->d_name);
+
+              //We're going to try to "touch" the file.
+              //file_descriptor = open(full_path, O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666);
+              //fd_path = malloc(32); //Lazy, only really needs 16
+              //sprintf(fd_path, "/proc/self/fd/%d", file_descriptor);
+              //utimes(fd_path, NULL);
+              //close(file_descriptor);
+              deleted = unlink(full_path);
+               if ( deleted == -1 ) printf("Error unlinking file\n");
+              break; //Break on first try
+          }
        }
+       break; //Break on first try
    }

    exit(EXIT_SUCCESS);
RedKrieg
  • 151
  • 1
  • 2
  • If you are willing to remove the whole thing, what does "rm -fr 03" do when done from in the parent directory "/home/user/public_html/domain.com/wp-content/uploads/2010"? You may wish to strace that command, too. – Skaperen Oct 23 '12 at 19:23
  • Unfortunately that has the same result, it hangs infinitely. An strace shows that it's looping over the same getdents calls as an ls. – RedKrieg Oct 23 '12 at 19:28
  • Can you provide a `df -h` listing and the output of `mount`? – ewwhite Oct 23 '12 at 20:13

3 Answers3

1

I am presumming you are doing this on an ACTIVE filesystem. Thus it is possible that while you are doing the find and such, the file got deleted before it could be processed by find. This would be ok.

What I might do to get a list of files is NOT to use ls. ls tries to do a sort and with a directory that size it would take pretty long just to get the list and then get it sorted.

What I do in such cases is (as the root user):

 find dirname -ls >outfile

If you want to delete something based on times:

 find dirname -type f -mtime +60 -print0 | xargs -0 rm -f

BTW, the -0 and -print0 are options in Linux so that filenames with "special" characters are passed properly to xargs. The above of course removes files that have been modified GREATER THAN 60 days before now.

mdpc
  • 11,698
  • 28
  • 51
  • 65
  • 2
    Of course, you could be right and the filesystem is in need of repair, in this case, it would be better to do a FULL fsck on the filesystem. – mdpc Oct 23 '12 at 19:58
  • Hi mdpc, I appreciate you taking a look. The C code above is performing essentially the same operation you've showed in the find here, including unlinking the file. It's entirely possible that the files don't, in fact, exist but regardless I need to find out why the parent directory thinks they do and fix that problem. find will not be able to solve the problem (I've tried `find -exec rm -f {} \;`, which should be more effective than using `xargs -0`. The problem is that getdents() returns data forever, so the find will never actually finish getting info. Even if it did, you can't unlink. – RedKrieg Oct 23 '12 at 22:02
1

Here's an easy way to determine if the filesystem needs repair, or at least understand how extensive the damage is...

Download the (free) R1Soft/Idera Hot Copy snapshot utility. This is an RPM and kernel module that provides copy-on-write snapshots of Linux filesystems without the need to have LVM, etc. in place.

Let's say your filesystems look like:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              12G  4.4G  7.0G  39% /
tmpfs                  14G     0   14G   0% /dev/shm
/dev/sda1             291M  166M  110M  61% /boot
/dev/sda3             9.9G  2.5G  7.0G  26% /usr
/dev/sdb1             400G  265G  135G  67% /home

You could use Hot Copy to snapshot /home without mounting the resulting filesystem... Then run a fsck to see if there are any issues.

hcp --skip-mount /dev/sdb1
fsck -a /dev/hcp1

This can save you the time of the reboot and help you assess the severity before scheduling client downtime.

Long-term, I'd just use XFS as the filesystem... But that's another topic...

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Hi ewwhite, thanks for this! I'm going to try this on our development servers tonight and if all goes well I'll give it a go. Seems like the only solution so far that has a chance of at least telling me that there is a problem. – RedKrieg Oct 23 '12 at 22:36
  • @RedKrieg Did it all work out? – ewwhite Nov 09 '12 at 15:32
0

Don't use "find ... -exec rm -fv {} \;" use "find ... -delete" instead

power
  • 111
  • 1