32

I have ext3 filesystem mounted with default options. On it I have some ~ 100GB files.

Removal of any of such files takes long time (8 minutes) and causes a lot of io traffic, which increases load on server.

Is there any way to make the rm not as disruptive?

  • 4
    Basically no method from here worked, so we developed our own. Described it in here: http://www.depesz.com/index.php/2010/04/04/how-to-remove-backups/ –  Apr 06 '10 at 15:15

11 Answers11

18

Upgrade to ext4 or some other modern filesystem that uses extents. Since ext3 uses the indirect blocks scheme rather than extents, deleting large files inevitably entails lots of work.

janneb
  • 3,761
  • 18
  • 22
14

The most interesting answer was originally buried in a comment on the question. Here it is as a first class answer to make it more visible:

Basically no method from here worked, so we developed our own. Described it in here: http://www.depesz.com/index.php/2010/04/04/how-to-remove-backups/ – depesz Apr 6 '10 at 15:15

That link is an incredibly thorough analysis of the exploration for and discovery of a workable solution.

Note also:

The article says:

As you can see, I used -c2 -n7 options to ionice, which seem sane.

which is true, but user TafT says if you want no disruption then -c3 'idle' would be a better choice than -c2 'best-effort'. He has used -c3 to build in the background and has found it to work well without causing the build to wait for ever. If you really do have 100% io usage then -c3 will not let the delete ever complete but he doesn't expect that is what you have based on the worked test.

Matt McClure
  • 347
  • 3
  • 11
6

You can give ionice a try. It won't make it faster but it might make it less disruptive.

Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
4

In terms of efficiency, using one rm per file is not optimal, as it requires a fork and exec for each rm.

Assuming you have a list.txt containing the files you want to remove this would be more efficient but it's still gonna be slow:

xargs -i rm {} < list.txt

Another approach would be to : nice -20 xargs -i rm {} < list.txt
(this will take less time but will affect your system greatly :)

or

I don't know how fast this would be but:

mv <file-name> /dev/null 

or

Create a special mount point with a fast filesystem (using a loop device ?) , use that to store and delete your Huge files.
(maybe move the files there before you delete them, maybe it's faster or maybe just unmount it when you want files gone)

or

cat /dev/null > /file/to/be/deleted (so it's zero-sized now) and if you want it to disappear just rm -rf <file> now

or even better

drop the cat and just do # > /file/to/be/emptied

  • well, i'm removing *1* file, so there is no overhead. –  Mar 31 '10 at 09:38
  • http://stackoverflow.com/questions/1795370/unix-fast-remove-directory-for-cleaning-up-daily-builds - check this too –  Mar 31 '10 at 09:50
1

I had problems getting the directory to delete at a reasonable pace, turns out the process was locking the disk and creating a pileup of processes trying to access the disk. ionice didn't work, it just continued to use 99% of the disk IO and locked all the other processes out.

Here's the Python code that worked for me. It deletes 500 files at a time, then takes a 2 second break to let the other processes do their work, then continues. Works great.

import os, os.path
import time

for root, dirs, files in os.walk('/dir/to/delete/files'):
    file_num = 0
    for f in files:
        fullpath = os.path.join(root, f)
        os.remove(fullpath)
        if file_num%500 == 1:
            time.sleep(2)
            print "Deleted %i files" % file_num
        file_num = file_num + 1
Nick Woodhams
  • 261
  • 2
  • 5
  • 1
    Try it on 100G+ files on ext3 filesystem. The problem is in size of single file, not number of files. –  Dec 23 '12 at 18:51
  • In your case it sounds like it wouldn't work. But I had a ton of small files. Thanks for the feedback. – Nick Woodhams Dec 23 '12 at 23:03
1

My two cents.

I ve already got this issue. "In sequential script that have to run fast, the process do remove a lot of file" .. So the "rm" will make that script speed close to the IO wait/exec time.

So to make thing quicker , I ve added another process (bash script) launched per cron.. like a garbage collector it remove all files in a particular directory.

Then I've updated the original script by replacing the "rm" by a mv to a "garbage folder" (rename the file by adding a counter at the end of its name to avoid collision).

This works for me, the script run a least 3 time faster. but it works well only if garbage folder and original file are under the same mount point (same device) to avoid file copy. (mv on same device consume less IO than rm)

Hope that help..

0

Also note that the answer by Dennis Williamson, who suggests ionice as a workaround for the load, will work only if your block device uses the CFQ io scheduler.

famzah
  • 328
  • 1
  • 3
0

mv <file-name> /dev/null

/dev/null is a file not a directory. Can't move a file, to a file, or you risk overwriting it.

Create a special mount point with a fast filesystem (using a loop device ?) , use that to store and delete your Huge files. (maybe move the files there before you delete them, maybe it's faster or maybe just unmount it when you want files gone)

I don't think this is practical. It would use unnecessarily more I/O than the OP would like.

user9876
  • 187
  • 1
  • 1
  • 8
Felipe Alvarez
  • 183
  • 2
  • 12
0

You could try creating a loop file system to store your backups on.

# dd if=/dev/zero of=/path/to/virtualfs bs=100M count=1024 # 100 MB * 1024 = 100 GB
# mke2fs /path/to/virtualfs
# mount -t ext2 /path/to/virtualfs /mnt/backups -o loop

Then, when you want to clear out the backups:

# umount /mnt/backups
# mke2fs /path/to/virtualfs
# mount -t ext2 /path/to/virtualfs /mnt/backups -o loop

Presto! The entire virtual file system is cleared out in a matter of moments.

amphetamachine
  • 832
  • 1
  • 8
  • 14
  • doesn't solve the problem, as it would work only if i'd want to remove all backups on given filesystem. –  Feb 22 '11 at 09:48
0

You can use multitheading whith xargs

find . -type f | xargs -P 30 rm -rf 

where 30 is the number of threads that you want to create. If you are using zero, the system creates maximum threads available to the user executing the task.

Scott Pack
  • 14,717
  • 10
  • 51
  • 83
-1

/dev/null is a file not a directory. Can't move a file, to a file, or you risk overwriting it.

Actually it's a device and all data written to it gets discarded so mv <file> /dev/null makes sense

From Wikipedia, the free encyclopedia
In Unix-like operating systems, /dev/null or the null device is a special file that discards all data written to it (but reports that the write operation succeeded), and provides no data to any process that reads from it (yielding EOF immediately).[1]

  • 1
    That is wrong and INCREDIBLY dangerous. /dev/null is a device, which a special file-like object. If you're root, "mv /some/file /dev/null" will DELETE the special /dev/null device and move your file there! So the next time someone tries to use /dev/null they'll be using a real file instead of the device, and disaster ensues. (When Wikipedia says that it "discards all data written to it", that means that "cat /some/file > /dev/null" will read /some/file and discard the data you read, but that won't affect the original file). – user9876 Oct 23 '14 at 14:01