How to limit Unix find number of results to handle directories with many files

8

1

Is there a way to limit the number of results returned by the find command on a unix system?

We are having performance issues due to an unusually large number of files in some directories.

I'm trying to do something like:

find /some/log -type f -name *.log -exec rm {} ; | limit 5000

lemotdit

Posted 2010-02-22T19:43:02.157

Reputation: 115

3Should we assume it was \; not just ; ? – Putnik – 2016-01-11T21:01:36.257

Answers

5

It sounds like you're looking for xargs, but don't know it yet.

find /some/log/dir -type f -name "*.log" | xargs rm

blahdiblah

Posted 2010-02-22T19:43:02.157

Reputation: 3 125

3-exec rm {} + would do the same thing without the overhead. though you could add head to the pipe chain: find [...] | head -5000 | xargs rm – quack quixote – 2010-02-22T21:57:13.743

7Note that just find ...|xargs is dangerous, as it will do funny/weird/disastrous things if some file name contains funny characters.

Always use find ... -print0 | xargs -0 (GNU extension, I believe). – sleske – 2010-02-22T22:58:42.307

10Instead of -exec rm... or xargs rm you could use find's -delete flag. – Martin Hilton – 2010-02-22T23:19:02.833

23

You could try something like find [...] |head -[NUMBER]. This will send a SIGPIPE to find when head outputs its however-many lines so that find doesn't continue its search.

amphetamachine

Posted 2010-02-22T19:43:02.157

Reputation: 1 443

I should had add that i'm also using the -exec arg. The HEAD works if the command is something like LS. But it does not work in my case since i'm using RM and thats seems to take all the files in one execution.

find /some/log -type f -name *.log -exec rm {} ; | HEAD -5000 – lemotdit – 2010-02-22T21:11:25.440

6Instead of using -exec rm, just pipe the results of find to head as suggested, and then pipe the result to xargs and rm. – Paul R – 2010-02-22T21:53:55.277

Good to know. I was assuming find would continue traversing the (potentially huge) file system when all you might want is a sample (my particular use case is to get one file from every directory). – Sridhar Sarnobat – 2016-08-16T18:14:57.890

0

Just |head didn't work for me:

root@static2 [/home/dir]# find . -uid 501 -exec ls -l {} \; | head 2>/dev/null
total 620
-rw-r--r--  1 root   root           55 Sep  8 15:22 08E7384AE2.txt
drwxr-xr-x  3 lamav statlus 4096 Apr 22  2015 1701A_new_email
drwxr-xr-x  3 lamav statlus 4096 Apr 22  2015 1701B_new_email
drwxr-xr-x  3 lamav statlus 4096 May 11  2015 1701C_new_email
drwxr-xr-x  2 lamav statlus 4096 Sep 24 18:58 20150924_test
drwxr-xr-x  3 lamav statlus 4096 Jun  4  2013 23141_welcome_newsletter
drwxr-xr-x  3 lamav statlus 4096 Oct 31  2012 23861_welcome_email
drwxr-xr-x  3 lamav statlus 4096 Sep 19  2013 24176_welco
drwxr-xr-x  3 lamav statlus 4096 Jan 11  2013 24290_convel
find: `ls' terminated by signal 13
find: `ls' terminated by signal 13
find: `ls' terminated by signal 13
find: `ls' terminated by signal 13
find: `ls' terminated by signal 13

(...etc...)

My (definitely not the best) solution:

find . -uid 501 -exec ls -l {} \; 2>/dev/null | head

The disadvantage is that the 'find' itself isn't terminated after required number of lines, and run in background until ^C or end, therefore ideas are welcomed.

Putnik

Posted 2010-02-22T19:43:02.157

Reputation: 692

0

find /some/log -type f -name *.log -exec rm {} ; | limit 5000

Well, the command as quoted will not work, of course (limit isn't even a valid command).

But if you run something similar to the find command above, it's probably a classic problem. You're probably having performance problems because find runs rm once for every file.

You want to use xargs, it can combine several files into one command line, so it will invoke rm a limited times for many files at once, which is much faster.

sleske

Posted 2010-02-22T19:43:02.157

Reputation: 19 887

"limit" is not a valid command, and your ; is not properly escaped. This will not work. – amphetamachine – 2010-02-24T06:08:59.480

@amphetamachine: I just quoted the question. But you're right, of course. – sleske – 2010-02-24T10:25:51.380

0

If you have a very large number of files in your directories, and/or when using pipes may not apply, etc., for instance because xargs would be limited by the number of arguments allowed by your system, one option is to use the exit status of an exec command as a filter for the next actions, something like:

rm /tmp/count ; find . -type f -exec bash -c 'echo "$(( $(cat /tmp/count) + 1 ))" > /tmp/count' \; -exec bash -c 'test $( cat /tmp/count ) -lt 5000' \; -exec echo "any command instead of echo of this file: {}" \;

The first exec will just increment the counter. The second exec tests the count, if less than 5000, then exits with 0 and the next command is executed. The third exec will do the intended on the file, in this case a simple echo, we can also -print -delete, etc. (I would use -delete instead of -exec rm {} \; for instance.

This is all based on the fact that find actions are executed in sequence assuming the previous one returns 0.

When using the above example, you'd want to make sure /tmp/count is not used by a concurrent process.

[edits following comments from Scott] Thanks a lot Scott for your comments.

Based on them: the number was changed to 5,000 to match the initial thread.

Also: this is absolutely correct that /tmp/count file will still be written 42,000 times (as many times as files being browsed), so "find" will still go through all the 42,000 entries,but will only execute the command of interest 5,000 times. So this command will not avoid browsing the whole and is just presented as an alternate option to usual pipes. Using a memory mapped temporary directory to host this /tmp/count file would seem appropriate.

And besides your comments, some additional edits: Pipes would be simpler in most typical cases.

Please find below more reasons for which pipes would not apply that easily though:

  • when file names have spaces in them, the "find" exec command would not want to forget to surround the {} with quotes "{}", to support this case,

  • when the intended command does not allow having all the file names in a raw, for instance, something like: -exec somespecificprogram -i "{}" -o "{}.myoutput" \;

So this example is essentially posted for those around who would have faced challenges with pipes and still do not want to go into a more elaborated programming option.

wang

Posted 2010-02-22T19:43:02.157

Reputation: 1

I don’t entirely understand the question — I guess that the OP has 42 000 .log files that they want to delete, but they want to delete only 5 000 at a time — because handling all 42 000 at once slows down the system too much.  This solution will perform the action (e.g., deletion) on only the first N files (confusingly, you have written your answer with N = 10 instead of the OP’s 5 000), but it will update the count file 42 000 times. – Scott – 2019-03-30T23:54:26.890

1

First of all, welcome to Super User! We always appreciate contributions from new community members, but you apparently have two Super User accounts: this one and this one. Please take the time to utilize the following Help Center tutorial and ask the Super User staff to merge your accounts: I accidentally created two accounts; how do I merge them?

– Run5k – 2019-04-03T13:05:06.507