0

I have a directory with integration tests temporary files, which has the following structure:

TestTemp --- Test01 (~1.5TB)  --- Subdirectory01 (~100GB) -- Destination JSON (1-100MB)
                              --- Subdirectory02          -- Destination JSON files
                              --- Subdirectory03          -- Destination JSON files
                              ...
                              --- Subdirectory15 (about 10-15 dirs)

         --- Test02
         --- Test03
         ...
         --- Test15 (about 5-7 directories)

Total is around 10TB.

Filesystem is ext3, can't handle this directory as a drive. I was following this article, but this one more about big, but few files.

I tried to run 6 tests for each option: find and exec rm -rf, find -delete and that strange perl script one by one and then in parallel for two directories.

Perl thing worked the best (about 4 mins), next was find -delete (4.10) and then the first option with 4.50.

Parallelisation didn't give the expected results. All the options were working slower. And this was only two directories at the same time. I think, that giving more directories will cause even longer running time.

I didn't try GNU utility parallel as I have no root access (the cleanup script is run by Jenkins), so I couldn't install it.

What is the best option to remove numerous big files in numerous directories as fast as possible?

vladfau
  • 133
  • 2
  • 7

1 Answers1

4

It's not obvious that you can get much better with the ext3 file system. See these graphs from a long investigation found on this serverfault thread. It took minutes no matter what they did.

ext4 with extents or xfs is probably faster

If you are deleting an entire tree, and can dedicate a volume to it, you might script deleting the logical volume and recreating the file system every time. If it comes to that you might as well experiment with a different file system though.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32