2
I was looking to remove duplicates of lines of text but after a space from the first word in this format;
apples blue apples green apples are sometimes red pairs green pairs black potato brown lemon ...
Anything after a space on each line would get disregarded, then removed duplicates.
Would end up with;
apples pairs potato lemon
I was hopefully looking for a way this could be done in linux terminal like;
command file_in.txt single_sout.txt
Thanks guys!
will this work on large files gb's in size?, thanks for the reply also – mark – 2014-09-19T23:40:46.617
For very large files it might be worth doing in 2 steps, so
cut -d " " -f 1 file_in.txt > file_tmp.txt
and thenuniq file_tmp.txt > file_out.txt
. That will help narrow down the issue if something fails. I don't know of any file size restrictions for eithercut
oruniq
, so the only real way to find out would be to test it. Running the commands is non-destructive though, so giving it a shot won't hurt. – Adam – 2014-09-21T16:08:14.950