North Korean dictionary order



The objective

Given a string of Hangul syllables, sort the characters in North Korean dictionary order.

Introduction to Hangul syllables

Hangul(한글) is the Korean writing system invented by Sejong the Great. Hangul syllables are allocated in Unicode point U+AC00 – U+D7A3. A Hangul syllable consists of an initial consonant, a vowel, and an optional final consonant.

The initial consonants are:

ㄱ ㄲ ㄴ ㄷ ㄸ ㄹ ㅁ ㅂ ㅃ ㅅ ㅆ ㅇ ㅈ ㅉ ㅊ ㅋ ㅌ ㅍ ㅎ

The vowels are:

ㅏ ㅐ ㅑ ㅒ ㅓ ㅔ ㅕ ㅖ ㅗ ㅘ ㅙ ㅚ ㅛ ㅜ ㅝ ㅞ ㅟ ㅠ ㅡ ㅢ ㅣ

The final consonants are:

(none) ㄱ ㄲ ㄳ ㄴ ㄵ ㄶ ㄷ ㄹ ㄺ ㄻ ㄼ ㄽ ㄾ ㄿ ㅀ ㅁ ㅂ ㅄ ㅅ ㅆ ㅇ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ

For example, has initial consonant , vowel , and final consonant .

South Korean dictionary order

The consonants and vowels above are sorted in South Korean dictionary order. The syllables are firstly sorted by initial consonants, secondly by vowels, and finally by (optional) final consonants.

The Unicode block for Hangul syllables contains every consonant/vowel combinations, and is entirely sorted in South Korean dictionary order.

The Unicode block can be seen here, and the first 256 characters are shown for illustrative purpose:


For example, the following sentence (without spaces and punctuations):


is sorted to:


In C++, if the string is in std::wstring, the sorting above is plain std::sort.

North Korean dictionary order

North Korean dictionary has different consonant/vowel order.

The initial consonants are sorted like:

ㄱ ㄴ ㄷ ㄹ ㅁ ㅂ ㅅ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ ㄲ ㄸ ㅃ ㅆ ㅉ ㅇ

The vowels are sorted like:

ㅏ ㅑ ㅓ ㅕ ㅗ ㅛ ㅜ ㅠ ㅡ ㅣ ㅐ ㅒ ㅔ ㅖ ㅚ ㅟ ㅢ ㅘ ㅝ ㅙ ㅞ

The final consonants are sorted like:

(none) ㄱ ㄳ ㄴ ㄵ ㄶ ㄷ ㄹ ㄺ ㄻ ㄼ ㄽ ㄾ ㄿ ㅀ ㅁ ㅂ ㅄ ㅅ ㅇ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ ㄲ ㅆ

Like South, the syllables are firstly sorted by initial consonants, secondly by vowels, and finally by (optional) final consonants.

If the sentence above is given, the output must be:



  1. If the input contains a character not within U+AC00 – U+D7A3, it falls in don't care situation.

  2. As this is a code-golf, the shortest code in bytes wins.

Partly related. – Arnauld – 2019-10-06T11:02:40.007

If that makes sense, I'd suggest to add a test case where the characters are sorted differently because of the final consonant exclusively (using ㄲ or ㅆ with the same initial consonant and the same vowel). – Arnauld – 2019-10-07T08:00:59.450

(More generally speaking, adding a few more test cases would be great.) – Arnauld – 2019-10-07T08:02:12.277

Suggested test cases: 가까나다따라마바빠사싸아자짜차카타파 (all initial consonants), 가개갸걔거게겨계고과괘괴교구궈궤귀규그긔기 (all vowels), 가각갂갃간갅갆갇갈갉갊갋갌갍갎갏감갑값갓갔강갖갗갘같갚갛 (all trailing consonants). – Grimmy – 2019-10-07T12:25:47.230

1Well, so much for that... 86 different Korean SQL collations; all of them sort in the "South Korean" manner. Nice (tough) question. – BradC – 2019-10-07T16:54:28.940



05AB1E, 47 45 38 bytes


Try it online!

Σ                        # sort characters of the input by:
 •...•                   #  compressed integer 13096252834522000035292405913882127943177557
      4B                 #  converted to base 4: 211211121231211111033010101010231002310010331121111111111111111121111111
        33¡              #  split on 33: [2112111212312111110, 010101010231002310010, 1121111111111111111121111111]
           €.ā           #  enumerate each (pairs each digit with its index)
              `ââ        #  reduce by cartesian product (yields a list with 11172 elements)
                 yÇ      #  codepoint of the current character
                   68+   #  + 68
                      è  #  index into the large list (with wraparound)


JavaScript (ES6),  150 148  137 bytes

Saved 10 bytes thanks to @Grimy

I/O: arrays of characters.


Try it online!

Splitting Hangul syllables

Given a Hangul character of code point 0xAC00 + \$n\$, the initial consonant \$I\$, vowel \$V\$ and final consonant \$F\$ are given by:

$$I=\left\lfloor\frac{n}{588}\right\rfloor,\ V=\left\lfloor\frac{n}{28}\right\rfloor\bmod 21,\ F=n\bmod 28$$


a => =>                  // for each character c in the input:
  "ANBCODEFPGQSHRIJKLM"[         //   start with a letter from 'A' to 'S'
    (n = c.charCodeAt() - 44032) //   for the initial consonant
    / 588 | 0                    //
  ] +                            //
  "AKBLCMDNERTOFGSUPHIQJ"[       //   append a letter from 'A' to 'U'
    n / 28 % 21 | 0              //   for the vowel
  ] +                            //
  ~(                             //   append "-2" for ㄲ or ㅆ (the only North
    n % 28 % 18 == 2             //   Korean final consonants that are sorted
  ) +                            //   differently) or "-1" otherwise
  c                              //   append the original character
)                                // end of map()
.sort()                          // sort in lexicographical order
.map(s => s[4])                  // isolate the original characters


Charcoal, 80 bytes

F”&→∧⁶⍘⎚%γD¦ρJG”F”E⎇↓Nη⊙��⭆Ws@}4”F”E↖hY9 t⟧⊙γIO↶5ε∧¬⁶⦃”Φθ⁼℅μΣ⟦⁴⁴⁰³²×⌕βι⁵⁸⁸⍘⁺κλ²⁸

Try it online! Link is to verbose version of code. Explanation: Works by generating all 11172 Hangul syllables in North Korean dictionary order and checking to see which ones are present in the input (so all other characters get deleted; also somewhat slow: takes 18 seconds on TIO). Explanation:


Loop over the compressed string acdfghjmopqrsbeiknl. This represents the list of South Korean initial consonants (numbered using the Western lowercase alphabet) in North Korean dictionary order.


Loop over the compressed string 02468cdhik1357bgj9eaf. This represents the list of South Korean vowels (numbered using ASCII digits and lowercase alphabet) in North Korean dictionary order.

F”E↖hY9 t⟧⊙γIO↶5ε∧¬⁶⦃”

Loop over the compressed string 013456789abcdefghijlmnopqr2k. This represents the list of South Korean final consonants (using the same numbering as the vowels) in North Korean dictionary order.


Concatenate the vowel and final consonant and decode as a base 28 number, then add on 588 times the initial vowel and 0xAC00. Print all characters from the input that have that as their ordinal.


Are the replacement characters valid syntax? – Dannyu NDos – 2019-10-14T21:56:52.050

@DannyuNDos It represents byte value \xFF in Charcoal's code page. – Neil – 2019-10-14T23:58:59.273