2
0
I frequently run into the situation where partially ordered data needs to be sorted. The first column is already sorted, later ones are not. Like this two column example:
1 5
1 3
2 10
2 -1
2 3
3 11
3 -200
3 20
The desired output is that produced by
sort -k 1,1g -k 2,2g
which works but has the problem that nothing will come out of sort until all of the input has been read. When the input is several gigabytes of text, that can take a while, during which time nothing downstream from the sort in a pipeline can execute. It also isn't very efficient in terms of memory usage since the entire dataset must reside there at once, even though to achieve the desired sort, only a tiny fraction really needs to be there.
With a script it wouldn't be difficult to break this up into chunks and then sort each chunk. Does the sort command have an option somewhere to notify it that the data in that column is already ordered? I don't see it in sort 8.4, but perhaps I just missed it?
If the sort encounters an out of order value in the column it has been told is already ordered it should exit. That indicates an error in upstream processing.
Are you sure you need
g
? Wouldn'tn
be enough? – choroba – 2018-02-21T22:56:30.043