Win: Remove duplicate lines in huge txt file

1

I need to remove duplicate lines from a huge txt. The file is about 150 mb big. When I try PSPad, I get memory error (despite I have 8 GB RAM).

Have you any idea or advice how to remove these duplicates?

user3620512

Posted 2014-08-02T15:26:31.323

Reputation: 11

This is not an answer - due to "Win:" in the subject/title - but anyway: Goto www.cygwin.com and download the basic install (i.e. you get a very basic "bash" in a terminal - given that you do NOT ADD anything). Then open a terminal, cd to where you have the file and type sort THEFILE.txt | uniq -ui >UNIQUE.txt ... type uniq --help | less and sort --help | less for short info on how these two commands work. – Hannu – 2014-08-02T15:36:32.833

Answers

2

Gawk: pattern scanning and processing language Download->Binaries->Zip

Copy "awk.exe" (gawk-3.1.6-1-bin\bin\awk.exe) to your directory. Create bat file:

awk "!x[$0]++" huge.txt>output.txt

crazypotato

Posted 2014-08-02T15:26:31.323

Reputation: 678

1

You can download $Notepad++ and use the TextFX plugin. Install Text FX by going to Plugins -> Plugin Manager -> Show Plugin Manager -> Available tab -> TextFX -> Install. After you have it installed, there will be a new menu called TextFX Select the portion of your document with duplicated (or just select the whole document). Go to TextFX -> TextFX Tools , select +Sort outputs only UNIQUE... and either sort lines case sensitive or sort lines case insensitive.

rakeshdas

Posted 2014-08-02T15:26:31.323

Reputation: 146