GNU sort by case-sensitive

36

4

The sort utility in Ubuntu 10.04 (Lucid) always sort by case-insensitive, just like if you specify --ignore-case to it.

The two sort just give the same result: 

echo -e "c\nb\nB\na" | sort
echo -e "c\nb\nB\na" | sort --ignore-case

But sometimes I want to sort by case-sensitive, so the upper-case letters come first, then lower-case letter. Is it possible?

Xiè Jìléi

Posted 2010-08-20T04:28:41.163

Reputation: 14 766

Answers

32

Override the collation order.

echo -e "c\nb\nB\na" | LC_COLLATE=C sort

Ignacio Vazquez-Abrams

Posted 2010-08-20T04:28:41.163

Reputation: 100 516

Regarding "foreign characters", the C.UTF-8 locale (LC_COLLATE=C.UTF-8) will sort case-sensitively, while treating non-ascii UTF-8 characters "normally". Unfortunately, it's not available upstream in glibc and only patched in by Debian, Ubuntu and derivatives. – aplaice – 2019-08-09T09:54:43.917

6This works, but by definition only if no foreign chars. are in play; they will sort after the 7-bit ASCII letters; try echo $'B\nÄ\nb\na' | LC_COLLATE=C sort. Shouldn't the fact that GNU sort with a non-C locale always performs case-INsensitive sorting be considered a bug? – mklement0 – 2014-05-28T17:38:17.523

15

Interestingly, yet another sort order is available like this:

echo -e "c\nb\nB\na" | LC_COLLATE=C sort --ignore-case

which puts the uppercase letter before its corresponding lowercase letter.

Here is a comparison of their outputs (I added "d" and "D") in the en_US.UTF-8 locale (except where overridden):

  1. echo -e "d\nD\nc\nb\nB\na" | sort
  2. echo -e "d\nD\nc\nb\nB\na" | sort --ignore-case
  3. echo -e "d\nD\nc\nb\nB\na" | LC_COLLATE=C sort
  4. echo -e "d\nD\nc\nb\nB\na" | LC_COLLATE=C sort --ignore-case

Output:

1   2   3   4
-   -   -   -
a   a   B   a
b   b   D   B
B   B   a   b
c   c   b   c
d   d   c   D
D   D   d   d

Paused until further notice.

Posted 2010-08-20T04:28:41.163

Reputation: 86 075

Interesting; I see this behavior in GNU sort v5.93 (comes with OS X 10.9.3(!)) and v8.13, but NOT in v8.21 and v8.22. I guess the results of 2. and 4. can still be considered equivalent (but that would obviously change with the addition of foreign characters). – mklement0 – 2014-05-31T05:18:56.397