Sorting a REALLY BIG delimited text file in UNIX / VMS

2

I am going to sort a REALLY BIG delimited text file, say 250Mb (or a bunch of files of more or less than 250Mb) . It have 37 fields, and I need to sort it by 5 fields, for example 1st, 4th, 5th, 6th 7th fields. Under Unix / VMS, do I have a good option to do this FAST? I can write COBOL program.
Now I am trying to sort them using the below command, but it already run for a long time and just not going to finished.

Thank you.

The command I used: time sort -t ',' -o sorted.txt +0 -1 +4 -5 +5 -6 +6 -7 +22 -23 *.DAT_gprscdr_ftpd

lamwaiman1988

Posted 2011-02-09T07:34:06.683

Reputation: 2 551

2Ask this question at stackoverflow.com – Amir Rezaei – 2011-02-09T07:50:32.993

1

Look at this SO question http://stackoverflow.com/questions/930044/why-unix-sort-command-could-sort-a-very-large-file you might be able to use the script in there.

– KeesDijk – 2011-02-09T08:03:29.487

1What's wrong with the shell script you provided? The unix sort is fast. – S.Lott – 2011-02-09T11:20:17.210

Answers

3

Maybe this question really should be in another SE-site, but here is my take on this issue.

1) Isn't the basic sort you provided in your question fast enough? How fast it should be? My 2 year old desktop sorts 270MB of Apache access log files in 21 seconds.

2) If that is not fast enough, you can try to first sorting each file individually and then merge them with "sort -m"

3) If not fast enough and you have more than one CPU/core, parallelize (sp?) the process with GNU Parallel

4) If still not fast enough and you have more machines available, parallelize the sorting process on multilple machines with GNU Parallel

Maglob

Posted 2011-02-09T07:34:06.683

Reputation: 146

0

you can load the data in MySQL database (LOAD command) and do what you want

jet

Posted 2011-02-09T07:34:06.683

Reputation: 2 675