19
3
For those who wish a lot more challenge then the old Spanish alphabetical order, let's take a look at how the Hungarian alphabet is ordered.
a, á, b, c, cs, d, dz, dzs, e, é, f, g, gy, h, i, í, j, k, l, ly, m, n, ny, o, ó, ö, ő, p, q, r, s, sz, t, ty, u, ú, ü, ű, v, w, x, y, z, zs
actually, q
, w
, x
and y
are not used in Hungarian words, but they are included for loanwords and foreign names. Foreign accented characters which are not part of the Hungarian alphabet (like ñ
), have the same priority as the non-accented ones, but we disregard them for this challenge.
The rules, summarized:
- Digraphs (
cs
,sz
, etc.) and the trigraph (dzs
) are considered as they were letters on their own.
cudar cukor cuppant csalit csata
- If the same digraph or trigraph occurs twice directly after each other in a word, they are written in a simplified way:
ssz
instead ofszsz
,ddzs
instead ofdzsdzs
but for the alphabetical order the non-simplified order is used. For examplekasza
<kaszinó
<kassza
, becausekassza
is used ask
+a
+sz
+sz
+a
for the sake of ordering. Sometimes you can find the non-contracted version in a word, in case of compound words.
kasza kaszinó kassza kaszt nagy naggyá nagygyakorlat naggyal nagyít
- capitalization doesn't matter, with the exception when the two words would be exactly the same without capitalization, in which case the lower case letter has priority
jácint Jácint Zoltán zongora
- The short and long versions of accented vowels have the same priority (
a - á
,e -é
,i - í
,o - ó
,ö - ő
,u - ú
ü - ű
), with a single exception: if the two words would otherwise be exactly the same, the short vowel has priority over the long vowel. Note, that the vowels with umlaut (ö
andü
) are completely different characters fromo
andu
.
Eger egér író iroda irónia kerek kerék kérek szúr szül
- Hyphens or spaces (for example, in compound words, names, etc.) are completely ignored
márvány márványkő márvány sírkő Márvány-tenger márványtömb
The task
Your program/function receives strings, composed of characters from the Hungarian alphabet (both lower- and upper-case), but a string might contain spaces or hyphens. For simplicity's sake, the minus sign (ASCII 45) can be used as a hyphen. Note that some characters (like the ő
) are not part of ASCII. You can use any encoding you wish, if it supports all the required characters.
You have to order the lines correctly and display/return the result.
You can use any randomly ordered subset of the above examples for testing.
EDIT:
Please don't use any built-in or other way which already knows the Hungarian alphabetical ordering by itself. It would make the competition pointless, and take all the challenge from finding the best regular expression or the best code golfing tricks.
EDIT2:
To clear a clarification asked by isaacg: "two strings that only differ by capitalization and long vs. short vowels, but differs in both ways" : Although no rule in the official document explicitly addresses this question, an example found within points to the length of the vowel having more importance than the capitalization.
@FryAmTheEggman Where do you see that? – Morgan Thrapp – 2016-03-11T17:57:46.030
@FryAmTheEggman : I don't quite understand it, what functions did i disallow? – vsz – 2016-03-11T17:57:46.273
I didn't post challenges for a while, is it now a custom to allow for the input to already have been read and stored in some container, so that your code doesn't have to do the reading? It might make sense, specialized golfing languages would have an unfair advantage otherwise, as they can read input with little to no code. – vsz – 2016-03-11T18:02:01.580
@FryAmTheEggman : ok, if no one argues otherwise, let's accept such solutions as well. I was only wondering because in previous questions it happened that i wrote programs and there were plenty of HTML submissions or other cases where the definition of "program" was quite ambiguous. – vsz – 2016-03-11T18:06:47.653
@FryAmTheEggman> I recommended the output format only because the words might include spaces. I would argue that having them in separate lines was not a "cumbersome" format. – vsz – 2016-03-11T18:08:30.050
For reference, we've got a meta post with default I/O formats (although I think that one might be linked from the post Fry already posted).
– Martin Ender – 2016-03-11T18:09:18.480@MartinBüttner : Seems OK to me. However, how would you guarantee that the output in unambiguous regarding spaces. As there is at least one example, a word can, in fact, contain whitespace, and it can be regarded as a single "word" for the sake of ordering. Or is the word "display" which you don't agree with? What would you suggest instead? – vsz – 2016-03-11T18:12:49.077
9Man, I can't even memorize our proper alphabetical order. How am I going to program this? ;) – Andras Deak – 2016-03-11T18:13:23.700
Clarification: what is the proper ordering for two strings that only differ by capitalization and long vs. short vowels, but differs in both ways? – isaacg – 2016-03-11T18:23:03.097
@isaacg : good point, and interestingly, it's nowhere specified in the official document of the Hungarian Academy. Maybe I'll have to write them to ask for a clarification? However, an example in that very document has
Eger
precedingegér
, so it seems the vowel length has priority. – vsz – 2016-03-11T18:32:46.0771I've been trying to come up with a bound-to-fail counterexample, where an apparent digraph is actually two letters, such as
malacsült
ornyílászáró
. I wonder if there are any (but you'd need a vocabulary to check for that, which is presumably not part of this challenge) – Andras Deak – 2016-03-11T21:19:07.5931There is no example containing dzs – TheConstructor – 2016-03-11T21:58:58.293
@vsz vowel length doesn't affect order unless it's the only thing that remains. For example, "Eger" comes before "egér", but "mámor" comes before "maradi" – Bálint – 2017-06-05T15:26:39.700
1Grammar. Level: Hungarian! – sergiol – 2017-07-28T00:51:30.697
I haven't realized just how complex our ordering is up until this point – SeinopSys – 2017-10-29T10:59:29.453