8

To delete files recursively in our IBM GPFS cluster, we use simple unix command like :

rm /my/directories -fr

However deletions are very long to be done.

Problem is that our distributed apps (Spark-based) took like one hour to be done. But then, it also took about an other hour to drop temporary files generated by distributed apps like Spark.

So global workloads are very inefficient. May be it's because the rm command has to list every sub-directories..

Anyway, do you known ways to efficiently drop an entire directory (and subdirectories) with GPFS ?

May be IBM give a special command to do that ?

Klun
  • 93
  • 5

2 Answers2

7

I don’t think you can speed up this process as “rm” triggers lots of the metadata updates for the distributed file systems, and they take quite some time to complete. What you can try is to issue “mv” to some temp folder within the same file system (!!!) and do an actual “rm” in the background.

BaronSamedi1958
  • 12,510
  • 1
  • 20
  • 46
0

You can use gpfs policy which is much faster than 'rm'.

Here is an example, e.g., I want to remove all files under /gpfs2/mysql/performance_schema/

The policy file is:

RULE 'my_del' DELETE DIRECTORIES_PLUS WHERE PATH_NAME LIKE '/gpfs2/mysql/performance_schema/%'

Then I can run the policy with:

mmapplypolicy /gpfs2/mysql -P del.pol

You can refer to these two links for some explanation about policy and the DELETE rule:

https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adv_polextip.htm

https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adv_rule_syntaxdiagrams.htm

Actually there is a 'mmfile' tool under /usr/lpp/mmfs/samples/ilm. You need to first compile mmfindUtil_processOutputFile by :make -f mmfindUtil_processOutputFile.sampleMakefile

mmfile has the exact same syntax as 'find', but it uses GPFS policy so it will run much faster than find for GPFS file system. e.g, you can use: mmfind sub1/ | xargs rm -f to remove the files.

You may also follow me at @guanglei_li and you may get additional support at "https://www.ibm.com/mysupport/s/".

  • The command is called 'mmfind', not 'mmfile'. Also you do not need to pass the results to xargs but can use the the '-xargs' parameter or the '-exec' parameter of 'mmfind' – uli42 Apr 20 '21 at 08:01
  • Another problem with this approach: this will delete the files but will keep most of the directories because they will be deleted sometime during the run an not at the end (after all the files a gone). You will get "error deleting ...: directory is not empty". So you'll have to delete the directories afterwards in another recursive rm. – uli42 Apr 20 '21 at 09:57