windows need help for Large text file editing for remove duplicates - combine txt work on 50GB+

1

i have windows 2012
32GB RAM I7 CPU Prossesor 1TB SSHD

i have .txt files of wordlists in lines the txt files start from 2GB to 50GB

what kind of tools or program can work in that large size/lines to combine all files to 1 file .txt then work in that 1 file .txt which can be 100GB after all combined/merged

to remove duplicates lines with CauseSinstive and don't crash or freeze or lag ? i know i asked a question look like that but i didn't get anything simple to

help me i don't understand so much in the cmd codes people use so if possible someone tell me about a program can really do that without a problem or a cmd way with easy explain for beginner

like what i need to do by steps and how to do so at the end i need something don't crash my pc or be very slow

i have tryied emeditor so far can't work in 10GB file and its got starting super slow please help me

DeathRival

Posted 2017-09-20T12:18:42.750

Reputation: 113

2You should consider doing this job with a programming language and not by hand. – IQV – 2017-09-20T12:22:31.747

It is unrealistic to given your system specifications to open 50 GB text files in an editor. You can parse 50 GB text files with your own program provided you don't attempt to do it in one giant blob. – Ramhound – 2017-09-20T12:29:30.747

You may need to seriously consider downsizing some of the files (cough 50GB) to work with them, even if you recombine them later. This question had some suggestions for doing a similar task - - https://stackoverflow.com/q/25249516/3395469

– Anaksunaman – 2017-09-20T13:27:13.913

@DeathRival: see my edit below, I added a step-by-step instruction to resolve your problem. Didn't tested it with that large text files, so give it a try – chloesoe – 2017-09-21T08:27:48.330

Answers

1

The best tool to manage huge txt wordlist for Windows is: Unified List Manager (ULM)

ULM

You can sort, merge, split, remove duplicates and many other useful stuff.

Joe6pack

Posted 2017-09-20T12:18:42.750

Reputation: 187

0

You already asked that here: how to merge large txt files of 10GB+ into 1 txt file and remove duplicates lines from this 1 txt file fastest way?.

I would still recommend to download a Linux (Ubuntu or Mint or whatever) and Burn it to CD or create a bootable USB drive and then start without installing. Then you could do what I recommended here https://superuser.com/a/1250792/715210

Or you install the Windows 10 Linux Bash Shell: https://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10/
I think the commands here https://superuser.com/a/1250792/715210 should work, they are really basic Linux commands.

Edit: I tested it with Win10 pro (you didn't mentioned your OS). Step by step to install Windows Linux Bash and merge file aa.txt and bb.txt to newfile.txt with eliminating duplicates (assuming your files are located in C:\temp):

  1. Win+i to open Settings
  2. Update & Security -> for Developers : choose developer mode
    • developer mode is going to be installed
  3. Win+R -> "control panel" -> enter
  4. Programs and features left side "Programs" or "windows feature activate"
    • Choose "Windows Subsystem for Linux Beta"
  5. reboot
  6. Press Win then search for "bash" and open
  7. There are some prompts you should answer with "Y" and you are asked to define a username with password
    • bash is installed now.
    • your drive C: is now available under /mnt/c.
  8. write cd /mnt/c/temp/ (or your path) then hit enter
  9. write cat aa.txt bb.txt | sort -u > newfile.txt
    • if that does not work, you could try to first merge the files to one file with cat aa.txt bb.txt > tempfile.txt then after that do the sort command like sort -u tempfile.txt > newfile.txt

chloesoe

Posted 2017-09-20T12:18:42.750

Reputation: 627