weird filename sorting (bash, ls, sort)

2

0

Good day, everyone.

I'm trying to feed my music to mplayer, like this: mplayer *, but getting wrong track order.

Here's what I get with ls (as well as ls -1, ls -1 | sort), note the order of the numbers 'I', 'II', 'III' within concerts:

Antonio Vivaldi - Op.3 concerto No.1 D-dur RV 549: I.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.1 D-dur RV 549: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.1 D-dur RV 549: II.Largo e Spiccato.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: I. Adagio e spiccato.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: II.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: III.Larghetto.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: IV. Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.3 G-dur RV 310: I.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.3 G-dur RV 310: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.3 G-dur RV 310: II.Largo.mp3
Antonio Vivaldi - Op.3 concerto No.4 e-moll RV 550: I.Adagio.mp3
Antonio Vivaldi - Op.3 concerto No.4 e-moll RV 550: II.Allegro assai.mp3
Antonio Vivaldi - Op.3 concerto No.4 e-moll RV 550: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.5 A-dur RV 519: I.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.5 A-dur RV 519: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.5 A-dur RV 519: II.Largo.mp3

It seems, the sorting is performed by something like track name rather than track number, how would I tell bash to sort the files lexicographically?

Here is some more info that might be relevant:

$ LC_ALL=C type ls
ls is aliased to `ls --color=auto'
$ locale
LANG=ru_RU.UTF-8
LANGUAGE=
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$ LC_ALL=C bash --version
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)
$ LC_ALL=C ls --version
ls (GNU coreutils) 8.13
Copyright (C) 2011 Free Software Foundation, Inc.

upd. I stored two first file names to files:

$ ls -1 | head -n1 > fname1; ls -1 | head -n2 | tail -n1 > fname2

then examined these two files with meld (diff GUI) to make sure there are no characters such as non-breakable spaces that could mess the sorting. So… there are no such characters, there is no difference except for the clearly visible. Same for the second and the third file names.

gluk47

Posted 2013-04-29T12:15:05.817

Reputation: 177

The sorting order is correct. sort utility does not understand roman numerals. – Spack – 2013-04-29T12:25:18.413

I guessed, character . (dot) is less than I, so «I.» should be less than «II» — isn't it so? And «II.» should be less than «III». – gluk47 – 2013-04-29T12:26:09.347

2Seems to ignore non-alphanumeric characters, therefore sorting "I" between Allegro and Largo/Larghetto. And it's correct for concerto 2, where I/II/III are named lexicographically, and IV sorts after IA, IIA, and IIIL. – Daniel Beck – 2013-04-29T12:28:38.340

Yes, looks like this. Is there a way to force sorting without ignoring any characters? – gluk47 – 2013-04-29T12:30:30.350

1Change the locale applied to sorting. E.g. LANG=C sort is different from LANG=de_DE sort. You might even be able to patch your primary locale to behave differently. – Daniel Beck – 2013-04-29T12:31:59.557

Yes, found it at the same moment. Thank you :) – gluk47 – 2013-04-29T12:34:05.990

1Similar to what Daniel suggested, you could see what output you get from LC_COLLATE=C sort. It should sort them in codepoint order. – Mono – 2013-04-29T12:34:36.610

Answers

2

You can set your locale temporarily for the duration of a command: I put your list of files in a file named files:

What you see:

$ LC_ALL='ru_RU.UTF-8' sort files
Antonio Vivaldi - Op.3 concerto No.1 D-dur RV 549: I.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.1 D-dur RV 549: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.1 D-dur RV 549: II.Largo e Spiccato.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: I. Adagio e spiccato.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: II.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: III.Larghetto.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: IV. Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.3 G-dur RV 310: I.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.3 G-dur RV 310: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.3 G-dur RV 310: II.Largo.mp3
Antonio Vivaldi - Op.3 concerto No.4 e-moll RV 550: I.Adagio.mp3
Antonio Vivaldi - Op.3 concerto No.4 e-moll RV 550: II.Allegro assai.mp3
Antonio Vivaldi - Op.3 concerto No.4 e-moll RV 550: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.5 A-dur RV 519: I.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.5 A-dur RV 519: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.5 A-dur RV 519: II.Largo.mp3

Sorted as you want:

$ LC_ALL=C sort files
Antonio Vivaldi - Op.3 concerto No.1 D-dur RV 549: I.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.1 D-dur RV 549: II.Largo e Spiccato.mp3
Antonio Vivaldi - Op.3 concerto No.1 D-dur RV 549: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: I. Adagio e spiccato.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: II.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: III.Larghetto.mp3
Antonio Vivaldi - Op.3 concerto No.2 g-moll RV 578: IV. Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.3 G-dur RV 310: I.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.3 G-dur RV 310: II.Largo.mp3
Antonio Vivaldi - Op.3 concerto No.3 G-dur RV 310: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.4 e-moll RV 550: I.Adagio.mp3
Antonio Vivaldi - Op.3 concerto No.4 e-moll RV 550: II.Allegro assai.mp3
Antonio Vivaldi - Op.3 concerto No.4 e-moll RV 550: III.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.5 A-dur RV 519: I.Allegro.mp3
Antonio Vivaldi - Op.3 concerto No.5 A-dur RV 519: II.Largo.mp3
Antonio Vivaldi - Op.3 concerto No.5 A-dur RV 519: III.Allegro.mp3

Specifically, you want to set the LC_COLLATE variable to C

glenn jackman

Posted 2013-04-29T12:15:05.817

Reputation: 18 546

Yes, my final solution was: export LC_COLLATE=C; mplayer *. – gluk47 – 2013-04-30T11:23:38.273

2

As you already solved your issue (at least for roman numbers up to 8), this is a general remark:

ls * is not a good choice to check the file order as it get passed to your music player with mplayer *. That is because ls itself can rearrange the files -- and it is by no means guaranteed that the mechanism is the same as that used by the shell. It depends on your settings. Use echo * instead -- or for a prettier output printf "%s\n" *.

This example illustrates this with my personal alias ls="ls -v":

$ touch 1 2 3 12
$ ls *
1  2  3  12
$ echo *
1 12 2 3
$ printf "%s\n" *
1
12
2
3

A much better solution is possible with zsh, but AFAIK not with bash:

Mikael Magnusson posted a magnificent function to deal with roman numbers on the zsh mailing list. Without going into details of the function itself this is how it works:

$ touch I II III IV V VI VII VIII IX X L C D M
$ print *
C D I II III IV IX L M V VI VII VIII X
$ print *(no+romansort)
I II III IV V VI VII VIII IX X L C D M

In the last print command(no+romansort)tells the shell, that you want to sort numerical (n) and use a custom function (o+) for parsing the filenames first.

This of course works, when the roman number is only part of the filename. But be aware, that you need the HIST_SUBST_PATTERN option to be set!

mpy

Posted 2013-04-29T12:15:05.817

Reputation: 20 866

Thank you, that's really interesting! May be one day I'll adopt zsh :)

// cannot upvote yet, I have too low reputation here. – gluk47 – 2013-04-30T11:24:46.760

1

For some reason, sort on Mac OSX (10.8.3) does sort the Roman numbers in the right order.

Nevertheless, what about first substituting the Roman numbers before feeding them to sort:

cat filename.txt | sed 's/IV/4/g' | sed s'/III/3/g' | sed 's/II/2/g' | sed 's/I/1/g' | sort

This worked for me (but again I tried this on OSX).

Vincent

Posted 2013-04-29T12:15:05.817

Reputation: 930

Seems too complex for me. I just want to perform mplayer * or something close to it and get my music. Of course, I can easily write smth like play.sh, but I'd like to avoid redundant complexity… – gluk47 – 2013-04-30T11:22:52.643