4

I have a 200GB flat file (one word per line) and I want to sort the file, then remove the duplicates and create one clean final TXT file out of it.

I tried sort with --parallel but it ran for 3 days and I got frustrated and killed the process as I didn't see any changes to the chunk of files it created in /tmp.

I need to see the progress somehow and make sure its not stuck and its working. Whats the best way to do so? Are there any Linux tools or open source project dedicated for something like this?

GMX Rider
  • 141
  • 2
  • Maybe you could provide your command line and I think your question belong more to the unix&linux site. Also as a side note, you wont be able to sort with progress bar because Linux sort needs to end before remove duplicate slip those two operations. – Kiwy Jun 01 '18 at 05:43
  • 1
    Exactly you question here https://unix.stackexchange.com/q/120096 – Kiwy Jun 01 '18 at 05:48
  • You should split your big file into multiple smaller manageable files, sort them separately, and then merge sort them back. I don’t know any specific linux tool for this job, but I suppose a combination of already existing available tools can do the job. – xmas79 Jun 01 '18 at 04:40

0 Answers0