Old Spanish alphabetical order

22

1

Before 1994, Spanish dictionaries used alphabetical order with a peculiarity: digraphs ll and ch were considered as if they were single letters. ch immediately followed c , and ll immediately followed l. Adding the letter ñ, which follows n in Spanish, the order was then:

a, b, c, ch, d, e, f, g, h, i, j, k, l, ll, m, n, ñ, o, p, q, r, s, t, u, v, w, x, y, z

Since 1994 ll and ch are considered as groups of two letters (l,l and c,h respectively), and thus alphabetical order is the same as in English, with the exception of the letter ñ.

The old order was definitely more interesting.

The challenge

Input a list of zero or more words and output the list sorted according to the old Spanish alphabetical order. Sorting is between words (not between letters within a word). That is, words are atomic, and the output will contain the same words in a possibly different order.

To simplify, we will not consider letter ñ, or accented vowels á, é, í, ó, ú, or uppercase letters. Each word will be a sequence of one or more characters taken from the inclusive range from ASCII 97 (a) through ASCII 122 (z).

If there are more than two l letters in a row, they should be grouped left to right. That is, lll is ll and then l (not l and then ll).

Input format can be: words separated by spaces, by newlines, or any convenient character. Words may be surrounded by quotation marks or not, at your choice. A list or array of words is also acceptable. Any reasonable format is valid; just state it in your answer.

In a similar way, output will be any reasonable format (not necessarily the same as the input).

Code golf, shortest wins.

Test cases

In the following examples words are separated by spaces. First line is input, second is output:

llama coche luego cocina caldo callar calma
caldo calma callar cocina coche luego llama

cuchara cuchillo cubiertos cuco cueva
cubiertos cuco cuchara cuchillo cueva

"Words" can be single letters too:

b c a ch ll m l n
a b c ch l ll m n

or unlikely combinations (remember the rule that l's are grouped left to right):

lll llc llz llll lllz
llc lll lllz llll llz

An empty input should give an empty output:



Of course, this order can be applied to other languages as well:

chiaro diventare cucchiaio
cucchiaio chiaro diventare

all alternative almond at ally a amber
a almond alternative all ally amber at

Luis Mendo

Posted 2016-03-10T14:41:43.097

Reputation: 87 464

5It's too late to correct the question now, because it has an answer, but actually rr was a single letter too. I believe it lost its status as a single letter later than ll and ch, so the explanation in Wikipedia is not so much wrong as partial. – Peter Taylor – 2016-03-10T15:13:29.503

"tweo"? filler+ – CalculatorFeline – 2016-03-10T15:19:27.083

@CatsAreFluffy Thanks! corrected – Luis Mendo – 2016-03-10T15:20:28.757

@PeterTaylor Was rr really considered a single letter? I had never heard that http://www.rae.es/consultas/exclusion-de-ch-y-ll-del-abecedario

– Luis Mendo – 2016-03-10T15:22:38.290

It was a single letter in the first Spanish dictionary I owned. – Peter Taylor – 2016-03-10T16:03:45.033

3

@PeterTaylor The official academy (RAE) didn't consider rr a single letter; at least not since 1803. But it's true that apparently it was considered a single letter in the Americas

– Luis Mendo – 2016-03-10T16:27:01.647

The Hungarian language has even more such peculiarities, and it didn't abandon them. cs, dz, dzs, gy, ly, ny, sz, tyand zs are all considered single letters. – vsz – 2016-03-10T17:35:31.570

@vsz hmm, with dzs it sounds like it would make for an even more interesting challenge because a single letter replacement wouldn't be sufficient. – p.s.w.g – 2016-03-10T18:31:22.320

@vsz: Hungarian also has ccs for <cs><cs>, which makes things more interesting: https://sourceware.org/bugzilla/show_bug.cgi?id=13547

– ninjalj – 2016-03-10T23:33:56.143

1Looks like Hungarian deserves a separate, much more difficult challenge :-) – Luis Mendo – 2016-03-10T23:34:50.517

@ninjalj : indeed, and technically also for ddz, ddzs, ggy, lly, nny, ssz, tty, and zzs, although a few of them are very rarely used. – vsz – 2016-03-11T04:38:49.730

I was not completely right, as the double digraphs (like ccs = cs + cs) have a different rule. And there are other exceptions besides digraphs. And there are a lot of accented vowels. I'll post a new challenge soon. :) – vsz – 2016-03-11T07:17:24.447

1The Welsh alphabet has loads of them, and is probably interesting since they're not in (English) alphabetical order, or include all latin characters: a, b, c, ch, d, dd, e, f, ff, g, ng, h, i, j, l, ll, m, n, o, p, ph, r, rh, s, t, th, u, w, y – Algy Taylor – 2016-03-11T09:29:38.067

@DonMuesli: rr was definitely considered a single letter by my teachers and textbooks in Argentina in the 70s. – Martin Argerami – 2016-03-11T12:16:08.983

@DonMuesli : Done: http://codegolf.stackexchange.com/questions/75370/hungarian-alphabetical-order

– vsz – 2016-03-11T17:56:41.813

Answers

7

Pyth, 14 13 bytes

Update: saw this got accepted and noticed a trivial 1 byte golf. Whoops.

:D"ll|ch|."1Q

Try it online. Test suite.

For each word, find all non-overlapping matches for the regex ll|ch|.. This splits the word into the "letters". Then, just sort the words by the splitted lists.

PurkkaKoodari

Posted 2016-03-10T14:41:43.097

Reputation: 16 699

Great approach! (Now that I finally understand it) :-) – Luis Mendo – 2016-03-10T22:58:28.750

That code is absolutely fascinating :D – Erik the Outgolfer – 2016-12-02T17:40:39.503

3

PowerShell, 46 44 51 50 bytes

$args|sort @{e={$_-replace'(?=ch|ll)(.).','$1Α'}}

The Α character is the Greek letter alpha which in comes after all Latin letters in PowerShell's default sort order (at least on my machine, I'm not sure if it's different in other locales). It's counted as 2 bytes in UTF8 encoding.

Example usage, assuming this string is saved in a file named es-sort.ps1:

> es-sort.ps1 'lzego' 'luego' 'llama'

luego
lzego
llama

p.s.w.g

Posted 2016-03-10T14:41:43.097

Reputation: 573

2

Mathematica, 81 bytes

StringReplace[Sort@StringReplace[#,"ch"->"cZ","ll"->"lZ"],"cZ"->"ch","lZ"->"ll"]&

Same approach as TimmyD's answer.

CalculatorFeline

Posted 2016-03-10T14:41:43.097

Reputation: 2 608

1

Python 2, 128 116 bytes

lambda p:map(''.join,sorted([{'!':'ll','?':'ch'}.get(c,c)for c in w.replace('ll','!').replace('ch','?')]for w in p))

I still feel like there's definitely room for improvement here.

Orez

Posted 2016-03-10T14:41:43.097

Reputation: 471

1

Javascript, 95 bytes

s=>s.map(a=>a.replace(/ll|ch/g,m=>m[0]+'~')).sort().map(a=>a.replace(/.~/g,m=>m>'d'?'ll':'ch'))

Charlie Wynn

Posted 2016-03-10T14:41:43.097

Reputation: 696

1

Perl, 40 bytes

Includes +1 for -p

Run with the list of words on STDIN:

perl -p spanisort.pl <<< "llama coche luego cocina caldo callar calma"

spanisort.pl

s/ll|ch|./\u$&/g;$_="\L@{[sort split]}"

Ton Hospel

Posted 2016-03-10T14:41:43.097

Reputation: 14 114