Unexpected result from sort command of linux bash

0

I have a file foo.txt with this content:

chr1    15
chr11   5
chr11   8
chr1    7
chr2    23
chr1    35

I tried to sort it first according to the first column, and then according to the second column for breaking ties by the following command in linux shell:

sort -k 1,1 -k 2,2n foo.txt

But the result is stange:

chr1    7
chr1    15
chr11   5
chr11   8
chr1    35
chr2    23

What I expected was this:

chr1    7
chr1    15
chr1    35
chr11   5
chr11   8
chr2    23

EDIT I checked the characters in file with od -fc foo.txt as suggested in comments, there were no strange characters. Here is the result:

0000000   3.5274972e-09   8.7240555e-33   3.5274972e-09    8.716562e-33
          c   h   r   1  \t   1   5  \n   c   h   r   1   1  \t   5  \n
0000020   3.5274972e-09   8.8610065e-33   3.5274972e-09   2.5496164e+21
          c   h   r   1   1  \t   8  \n   c   h   r   1  \t   7  \n   c
0000040   2.1479764e-33   2.5493397e+21   2.1359394e-33     9.37439e-40
          h   r   2  \t   2   3  \n   c   h   r   1  \t   3   5  \n
0000057

I am using sort (GNU coreutils) 8.21 Any ideas?

Ali

Posted 2014-10-22T20:53:59.570

Reputation: 391

Hmm, I get your expected output, GNU sort version 8.21 – glenn jackman – 2014-10-22T20:58:36.090

Are there any strange characters in the file? Try od -c foo.txt – glenn jackman – 2014-10-22T21:00:14.980

Output is also as expected with Cygwin sort (GNU coreutils) 8.15 – DavidPostill – 2014-10-22T21:03:34.603

@glennjackman I have updated the question according to your comment – Ali – 2014-10-22T21:10:31.407

@DavidPostill The problem got worse! I guessed I am missing some input argument... – Ali – 2014-10-22T21:11:51.880

You may be using a different locale: what you you get: env | grep -E '^(LANG|LC)' -- try also LC_COLLATE=C sort -k 1,1 -k 2,2n foo.txt – glenn jackman – 2014-10-22T21:32:02.597

@glennjackman LANG=en_US.UTF-8 LANGUAGE= (empty) – Ali – 2014-10-22T21:33:26.207

@glennjackman Oh! I once used LC_COLLATE=C as a separate command and then performed sort, it did not change the result. But using either LC_COLLATE=C or LC_ALL=C as a prefix to sort command worked like a Charm. Are you willing to describe this as an answer? I will be happy to mark it accepted then – Ali – 2014-10-22T21:41:27.503

@glennjackman I also have LANG=en_US.UTF-8 and LANGUAGE undefined but I get the correct output? Why is Ali getting the wrong output? – DavidPostill – 2014-10-22T21:56:14.843

Answers

1

It appears that your locale's sorting preferences where the issue. You can specify it in your environment, then any command that uses it (including sort) will obey it:

export LC_COLLATE=C
sort -k 1,1 -k 2,2n foo.txt

Or you can specify that value just for the duration of the sort itself

LC_COLLATE=C sort -k 1,1 -k 2,2n foo.txt       # or
env LC_COLLATE=C sort -k 1,1 -k 2,2n foo.txt

glenn jackman

Posted 2014-10-22T20:53:59.570

Reputation: 18 546