Ordering of filenames in linux - not purely lexicographical?

1

Given a few files in a directory data the following ls command provides a surprising result in the sense of how they are ordered:

for f in data/*; do echo $f;  done


data/CitiesBaselineCounts2015010520150112.49.csv
data/CitiesBaselineCounts2015010520150112.4.csv
data/CitiesBaselineCounts2015010520150112.50.csv
data/CitiesBaselineCounts2015010520150112.5.csv
data/CitiesBaselineCounts2015010520150112.6.csv
data/CitiesBaselineCounts2015010520150112.7.csv
data/CitiesBaselineCounts2015010520150112.8.csv
data/CitiesBaselineCounts2015010520150112.9.csv
data/CitiesBaselineCounts2015010520150112.csv

The . character is ascii 46- which precedes the codes for all of the digits (48 to 57).

So then the ordering is not lexicographical. What are the rules for the sorting used by the ls command?

javadba

Posted 2015-08-15T17:53:13.777

Reputation: 2 201

List files sorted numerically – DavidPostill – 2015-08-15T18:10:48.283

@DavidPostill Ok that's also helpful (and I will use it for this case) - but does not directly answer the lexicographical ordering. – javadba – 2015-08-15T19:23:07.613

Answers

2

When you do for f in data/* the enumeration of filenames is being done by your shell not "ls". Normally, shells will sort lexicographically, (bash does) but they may use your LC_COLLATE locale collating sequence order. Perhaps your particular shell does not sort at all.

Directory entries are usually not sorted, but it depends on the underlying filesystem. Use ls -f to list a directory without sorting.

When you say ls *, first the shell expands * and may sort the result, then ls will sort the filenames again.

meuh

Posted 2015-08-15T17:53:13.777

Reputation: 4 273

using bash here. – javadba – 2015-08-15T19:22:47.560

And what locale? Try export LC_COLLATE=C in your script. Also, the filenames may not be ascii but in eg a utf8 encoding. – meuh – 2015-08-15T19:25:16.667

that export worked. wow that was non-intuitive. Any idea why would that not be the default behavior? – javadba – 2015-08-15T19:28:01.117

internationalisation is a big thing that non-programmers want to work exactly as they are used to in their paper dictionaries and so on. A lot of work has been done in linux to support this. However, it makes life difficult for many simple programming tasks, as you can see. I invariably set LANG=C (and LC_COLLATE=C and the all others) to avoid many such bizarre problems. – meuh – 2015-08-15T19:33:47.293

1

ls, sort and your script all give the same ordering, which is lexicographic based on ASCII value of each position except that non-alphnumeric are ignored

abc.
abc..
abc0
abc1
abc_1
abc.1
abc..1
abc.1.4
abc.1..4
abc.1.5
abc2
abc~2
abc_2
abc-2
abc.2
abc#2
abc%2
abc3
abc4
abc4.1
abc4.2
abc49
abc_9
abca
abcA
abcc

see the answer to question 631402 for more discussion including turning off locale which gives you lexicographic sorting by ASCII value including symbols

Terry L Anderson

Posted 2015-08-15T17:53:13.777

Reputation: 221