7

A poorly tested program created a directory on an NFS share with an enormous number of files, which I need to remove.

ls -ald /home/foo
drwxrwxr-x 2 503 503 317582336 Jul 29 11:38 /home/foo

The directory is located on an NFS mount of about 600GB on a netapp-type device. I atually have no idea how many files are in it but a similar directory created after only 10 minutes has 121,000 files, so it's probably in the millions somewhere. OS is Linux 2.6 kernel.

Trying to find a way to list or remove it and its contents. find /home/foo results in find dying after about 1 hour, with no output other than "./"

Jenny D
  • 27,358
  • 21
  • 74
  • 110

5 Answers5

6

(answering my own question in case anyone finds it while searching for similar.) There are possibly as many as 9 million files in the directory.

Unfortunately can't log in to the server directly, it's an appliance. The only access to the filesystems is via export.

rm -rf didn't seem to work. watching with strace it was hanging.

find woudn't complete, died with no error.

ls -1 never seemed to complete. (I realize now that it attempts to sort the results, ls -1f might have worked eventually).

what did work was a simple perl snippet. I assume c code do the same would work.

 opendir( my $dh,  '/home/foo' ) or die $!
    while ( my $file = readdir $dh ) {
        print "$file\n";
    }
user50910
  • 61
  • 2
  • 1
    A good solution using the swiss-army chainsaw - I mentally skipped past the "millions of files" part, and this is definitely the way to go. I would even say go so far as to unlink() the file in perl rather than piping the output to something else, for the sake of speed and sanity. – voretaq7 Aug 11 '10 at 20:15
4

This rather old thread came up for me on Google, so I'd like to share some statistics.

Here is a comparison of three different methods to remove files on an NFS server:

  1. plain rm: rm dir/*
  2. find: find dir/ -type f -exec rm {} \;
  3. rsync: tempdir=$( mktemp -d ); \ rsync -a --delete $tempdir/ dir/; \ rmdir $tempdir

To compare these methods I created 10000 files each time I ran a test with

for i in {1..10000} ; do touch $i ; done

The results on the plot show that rsync is much faster and find is the slowest of the three methods performance of different methods to remove multiple files, rsync is faster

The results stay when the number of files is doubled (I did not run find on 20000 files), time averaged over 3 runs for 10000 files and 2 runs for 20000 files.

        10000    20000
find     28.3       -
rm       12.9     23.9
rsync     6.94    12.2

It is interesting to see what else does the performance of these methods depend on.

A related post on this site discusses the deletion of a big number of files on an ext3 filesystem.

Dmitri Chubarov
  • 2,296
  • 1
  • 15
  • 28
  • 1
    You'd likely have much better results on the "find" command by either using the -delete option or using xargs to batch the call to rm. – Elliott Mar 27 '18 at 04:45
3

I would suggest that you NOT try to remove these files over NFS -- Log in to the file server directly and delete the files there. This will be substantially less abusive to the NFS server (and the client).

Beyond that, use find (as described by MattBianco) or use ls -1 | xargs rm -f (from within that directory) if find is having trouble completing (the latter should work OK over NFS, though again I would recommend doing it locally).

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • 1
    +1, Might also want to start with `fsck` to be sure everything's working correctly. – Chris S Aug 11 '10 at 14:53
  • A fsck` on the NFS server would definitely not be out of line (especially if it's Linux / EXT filesystem - I've seen Bad Things happen with file bombs on EXT2/3). If the system doesn't support a "check only" fsck that can be done with the filesystem live though taking the outage could be an issue... – voretaq7 Aug 11 '10 at 15:01
1

Maybe find /home/foo -mount -depth -type f -exec rm -f {} \; could be helpful.
-exec makes find execute a command (terminated by the semicolon: \;), with the braces {} replaced by the file's pathname.
This means one rm process for each file to remove.
-type f only does it for files, in case you have a directory structure under /home/foo, the directories will remain. Only files will be removed.

MattBianco
  • 587
  • 1
  • 6
  • 23
0

This seems a little obvious, but have you tried:

rm -rf /home/foo/

? Failing that, is there a way you can use a regex to get a small enough subset to hand to |xargs rm?

If ls fails, you might try echo /home/foo/* | xargs rm though that might just fail with 'line too long' or the like. Oh, and I second the recommendation to try and do this directkly on the server instead of over NFS.

pjz
  • 10,497
  • 1
  • 31
  • 40