0
I have a file foo.txt with this content:
chr1 15
chr11 5
chr11 8
chr1 7
chr2 23
chr1 35
I tried to sort it first according to the first column, and then according to the second column for breaking ties by the following command in linux shell:
sort -k 1,1 -k 2,2n foo.txt
But the result is stange:
chr1 7
chr1 15
chr11 5
chr11 8
chr1 35
chr2 23
What I expected was this:
chr1 7
chr1 15
chr1 35
chr11 5
chr11 8
chr2 23
EDIT
I checked the characters in file with od -fc foo.txt
as suggested in comments, there were no strange characters. Here is the result:
0000000 3.5274972e-09 8.7240555e-33 3.5274972e-09 8.716562e-33
c h r 1 \t 1 5 \n c h r 1 1 \t 5 \n
0000020 3.5274972e-09 8.8610065e-33 3.5274972e-09 2.5496164e+21
c h r 1 1 \t 8 \n c h r 1 \t 7 \n c
0000040 2.1479764e-33 2.5493397e+21 2.1359394e-33 9.37439e-40
h r 2 \t 2 3 \n c h r 1 \t 3 5 \n
0000057
I am using sort (GNU coreutils) 8.21
Any ideas?
Hmm, I get your expected output, GNU sort version 8.21 – glenn jackman – 2014-10-22T20:58:36.090
Are there any strange characters in the file? Try
od -c foo.txt
– glenn jackman – 2014-10-22T21:00:14.980Output is also as expected with Cygwin sort (GNU coreutils) 8.15 – DavidPostill – 2014-10-22T21:03:34.603
@glennjackman I have updated the question according to your comment – Ali – 2014-10-22T21:10:31.407
@DavidPostill The problem got worse! I guessed I am missing some input argument... – Ali – 2014-10-22T21:11:51.880
You may be using a different locale: what you you get:
env | grep -E '^(LANG|LC)'
-- try alsoLC_COLLATE=C sort -k 1,1 -k 2,2n foo.txt
– glenn jackman – 2014-10-22T21:32:02.597@glennjackman LANG=en_US.UTF-8 LANGUAGE= (empty) – Ali – 2014-10-22T21:33:26.207
@glennjackman Oh! I once used LC_COLLATE=C as a separate command and then performed sort, it did not change the result. But using either LC_COLLATE=C or LC_ALL=C as a prefix to sort command worked like a Charm. Are you willing to describe this as an answer? I will be happy to mark it accepted then – Ali – 2014-10-22T21:41:27.503
@glennjackman I also have
LANG=en_US.UTF-8
andLANGUAGE
undefined but I get the correct output? Why is Ali getting the wrong output? – DavidPostill – 2014-10-22T21:56:14.843