-1
1
simple my question is different because i need to merge the files into one too then remove the duplicates lines from the that file which will be over 50GB txt i have large .txt from 10GB+ files
i want to merge them into 1 .txt file
then remove all the duplicates lines from that 1 large .txt file combined which will be around 50GB txt file or 100GB txt file
so what can handle that kind of large file and remove the duplicates from it smoothly ?
i need the fastest way because i tryied bouth notepad++ and emeditor they work super heavy with them for merge or duplicates removing and take forever
i have 12GB RAM
Scripting is probably going to be the fastest, but do note, working with files this large means this is going to take forever regardless. Therefor, getting the fastest method is really a matter of opinion. Its going to take more time to find the fastest than just to get it done. – LPChip – 2017-09-15T21:08:51.757
that didn't help me and i didn't understand anything of it and my question typecially different since i look for more large files its 10GB minimum and can go to 100GB and i have free space for work already over 300GB – DeathRival – 2017-09-15T21:09:39.753
ok i have found a way from How Does One Remove Duplicate Text Lines From Files Larger Than 4GB you can delete my question if you want what i found is: http://www.pilotedit.com/index.html thanks for who post it
– DeathRival – 2017-09-15T21:44:14.883Ramhound already pointed you to a good answer. But let me add a few things. You can join several text files together using the copy command. Open a command prompt, use
cd
to move to the folder with your text files and then typecopy file1.txt + file2.txt combined_file.txt
. That will join both files and will take about 3 seconds per GB if you are working on an SSD. It will be slower on a hard disk. – SpiderPig – 2017-09-15T21:47:18.080The
sort -u
command mentioned in the other thread is also very fast and can handle 0.1 GB per second. – SpiderPig – 2017-09-15T21:55:32.293