7
I found that the sort result in ASCII:
Source file test
:
1-
11-
1-a
11-a
Sort using ASCII:
$ LANG=en_US.ascii sort test
1-
1-a
11-
11-a
And using UTF-8:
$ LANG=en_US.utf8 sort test
1-
11-
11-a
1-a
I feel it's so counter-intuitive, and it's not dictionary order.
Isn't the character '-' (002d
) is always less then [0-9]
(0030-0039
)?
What's the general rule in UTF-8 collation?
And how to bypass it, just make -
be less then [0-9]
while keep other characters unchanged for UTF-8, in Linux? (So it can affects the result of ls --sort
, sort
, etc. )
@grawity I see this on gmail when I open zip files. I see this in Win7 with images: 11, 12, 13, ..., 19, 1. – Wolfpack'08 – 2014-07-23T17:23:03.187
3Where precisely are you seeing this? With
sort
8.5 from GNU coreutils, "1-" always comes before "11-", with any locale. – user1686 – 2011-01-01T15:17:59.330It's my mistake. I have truncated the strings. I changed the example please try again. – Xiè Jìléi – 2011-01-01T18:39:12.703