6
5
As far as I can tell, setting the environment variable LC_COLLATE=en_US.utf8
changes four things compared to LC_COLLATE=c
, regarding how programs like ls
will sort files:
- Unicode characters are preserved (rather than being replaced with
??
garbage) - Accents and diacritical marks don't affect sort order
- Case differences don't affect sort order
- Punctuation characters (such as dots) don't affect sort order
Feature 1 is must-have in this day and age.
Features 2 and 3 are great too, since they make it more convenient to deal with real-life Unicode filenames.
Feature 4, on the other hand, is something that I find really anti-productive in my day-to-day work, since it often produces counter-intuitive sort orders for Linux filenames - where dots tend to be used to separate suffixes or to indicate dotfiles. I really can't imagine why anyone thought it would be a good idea to ignore dots when sorting filenames.
For example:
$ touch foo.txt foo2.txt foó3.txt foo4.txt
$ LC_COLLATE=en_US.utf8 ls
foo2.txt foó3.txt foo4.txt foo.txt
$ LC_COLLATE=c ls
foo.txt foo2.txt foo4.txt fo??3.txt
Neither is satisfactory. This is how I'd want those files to be sorted:
foo.txt foo2.txt foó3.txt foo4.txt
In other words, just like with LC_COLLATE=en_US.utf8
, except that punctuations are treated as significant characters (which are sorted before letters).
Does any LC_COLLATE setting exist which does this?
If there is no punctuation-respecting one that supports all features 1-3, is there at least one that supports feature 1 (i.e. sort like LC_COLLATE=c
but don't garble Unicode chars)?