Is there a Unicode-aware LC_COLLATE sort order which respects punctuation?

6

5

As far as I can tell, setting the environment variable LC_COLLATE=en_US.utf8 changes four things compared to LC_COLLATE=c, regarding how programs like ls will sort files:

  1. Unicode characters are preserved (rather than being replaced with ?? garbage)
  2. Accents and diacritical marks don't affect sort order
  3. Case differences don't affect sort order
  4. Punctuation characters (such as dots) don't affect sort order

Feature 1 is must-have in this day and age.
Features 2 and 3 are great too, since they make it more convenient to deal with real-life Unicode filenames.
Feature 4, on the other hand, is something that I find really anti-productive in my day-to-day work, since it often produces counter-intuitive sort orders for Linux filenames - where dots tend to be used to separate suffixes or to indicate dotfiles. I really can't imagine why anyone thought it would be a good idea to ignore dots when sorting filenames.

For example:

$ touch foo.txt foo2.txt foó3.txt foo4.txt

$ LC_COLLATE=en_US.utf8 ls
foo2.txt  foó3.txt  foo4.txt  foo.txt

$ LC_COLLATE=c ls
foo.txt  foo2.txt  foo4.txt  fo??3.txt

Neither is satisfactory. This is how I'd want those files to be sorted:

foo.txt  foo2.txt  foó3.txt  foo4.txt

In other words, just like with LC_COLLATE=en_US.utf8, except that punctuations are treated as significant characters (which are sorted before letters).

Does any LC_COLLATE setting exist which does this?

If there is no punctuation-respecting one that supports all features 1-3, is there at least one that supports feature 1 (i.e. sort like LC_COLLATE=c but don't garble Unicode chars)?

smls

Posted 2015-01-08T16:16:23.160

Reputation: 161

Answers

2

Problem number 1 is that LC_COLLATE=c is an invalid locale. You need a capital C (LC_COLLATE=C):

$ LC_COLLATE=c ls -1a
./
../
.sharp
.zharp
Sharp
sharp
szharp
zharp
??harp

$ LC_COLLATE=C ls -1a
./
../
.sharp
.zharp
Sharp
sharp
szharp
zharp
ßharp

I don't know how to do unicode-aware sorting without sorting filenames starting with a dot on top though (searching for an answer to this is how I ended up here) :-/

Martin Tournoij

Posted 2015-01-08T16:16:23.160

Reputation: 303