Convert between Korean two-set keyboard and qwerty keyboard

14

2

Introduction

It is somewhat like DVORAK Keyboard layout , but MUCH harder.

Let's talk about the Korean keyboard first. As you can see in Wikipedia, there is a Kor/Eng key to change between Korean and English key sets.

Koreans sometime type wrong : they attempt to write in Korean on a qwerty keyboard or in English on a two-set keyboard.

So, here's the problem : if given Korean characters typed in two-set keyboard, convert it to alphabetic characters typed in qwerty keyboard. If given alphabetic characters typed in qwerty, change it to two-set keyboard.

Two-set Keyboard

Here is the two-set keyboard layout :

ㅂㅈㄷㄱㅅㅛㅕㅑㅐㅔ
 ㅁㄴㅇㄹㅎㅗㅓㅏㅣ
  ㅋㅌㅊㅍㅠㅜㅡ

and with shift key :

ㅃㅉㄸㄲㅆㅛㅕㅑㅒㅖ

just the top row changes while the others do not.

About Korean Characters

if it ended here, it could be easy, but no. When you type

dkssud, tprP!

output is not shown in this way:

ㅇㅏㄴㄴㅕㅇ, ㅅㅔㄱㅖ!

but in this way :

안녕, 세계!(means Hello, World!)

and it makes things much harder.

Korean Characters separate into three parts : 'Choseong(consonant)', 'Jungseong(Vowel)', and 'Jongseong(consonant in the end of syllable : can be blank)', and you have to separate it.

Fortunately, there is way to do that.

How to separate

There are 19 Choseong, 21 Jungseong, and 28 Jongseong(with blank), and 0xAC00 is '가', first character of the Korean characters. Using this, we can separate Korean characters into three parts. Here is the order of each and its position in two-set keyboard.

choseong order :

ㄱㄲㄴㄷㄸㄹㅁㅂㅃㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎ
r R s e E f a q Q t T d w W c z x v g

jungseong order :

ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ
k o i O j p u P h hk ho hl y n nj np nl b m ml l

jongseong order :

()ㄱㄲㄳㄴㄵㄶㄷㄹㄺㄻㄼㄽㄾㄿㅀㅁㅂㅄㅅㅆㅇㅈㅊㅋㅌㅍㅎ
()r R rt s sw sg e f fr fa fq ft fx fv fg a q qt t T d w c z x v g

Let's say (unicode value of some character) - 0xAC00 is Korean_code, and index of Choseong, Jungseong, Jongseong is Cho, Jung, Jong.

Then, Korean_code is (Cho * 21 * 28) + Jung * 28 + Jong

Here is the javascript code which separate Korean character from this Korean website, for your convenience.

var rCho = [ "ㄱ", "ㄲ", "ㄴ", "ㄷ", "ㄸ", "ㄹ", "ㅁ", "ㅂ", "ㅃ", "ㅅ", "ㅆ", "ㅇ", "ㅈ", "ㅉ", "ㅊ", "ㅋ", "ㅌ", "ㅍ", "ㅎ" ];
var rJung =[ "ㅏ", "ㅐ", "ㅑ", "ㅒ", "ㅓ", "ㅔ", "ㅕ", "ㅖ", "ㅗ", "ㅘ", "ㅙ", "ㅚ", "ㅛ", "ㅜ", "ㅝ", "ㅞ", "ㅟ", "ㅠ", "ㅡ", "ㅢ", "ㅣ" ];
var rJong = [ "", "ㄱ", "ㄲ", "ㄳ", "ㄴ", "ㄵ", "ㄶ", "ㄷ", "ㄹ", "ㄺ", "ㄻ", "ㄼ", "ㄽ", "ㄾ","ㄿ", "ㅀ", "ㅁ", "ㅂ", "ㅄ", "ㅅ", "ㅆ", "ㅇ", "ㅈ", "ㅊ", "ㅋ", "ㅌ", "ㅍ", "ㅎ" ];
var cho, jung, jong;
var sTest = "탱";
var nTmp = sTest.charCodeAt(0) - 0xAC00;
jong = nTmp % 28; // Jeongseong
jung = ((nTmp - jong) / 28 ) % 21 // Jungseong
cho = ( ( (nTmp - jong) / 28 ) - jung ) / 21 // Choseong

alert("Choseong:" + rCho[cho] + "\n" + "Jungseong:" + rJung[jung] + "\n" + "Jongseong:" + rJong[jong]);

When assembled

  1. Note that , , , , , , is a combination of other jungseongs.
ㅗ+ㅏ=ㅘ, ㅗ+ㅐ=ㅙ, ㅗ+ㅣ=ㅚ, ㅜ+ㅓ=ㅝ, ㅜ+ㅔ=ㅞ, ㅜ+ㅣ=ㅟ, ㅡ+ㅣ=ㅢ
  1. Choseong is necessary. That means, if frk is given, which is ㄹㄱㅏ, it can change in two way : ㄺㅏ and ㄹ가. Then, you have to convert it into a way which has choseong. If jjjrjr given, which is ㅓㅓㅓㄱㅓㄱ, leading s don't have anything that can be choseong, but the fourth has that can be choseong, so it's changed into ㅓㅓㅓ걱.

Another example : 세계(tprP). It can be changed to 섹ㅖ((ㅅㅔㄱ)(ㅖ)), but because choseong is necessary, it's changed into 세계((ㅅㅔ)(ㄱㅖ))

Examples

input 1

안녕하세요

output 1

dkssudgktpdy

input 2

input 2

output 2

ㅑㅞㅕㅅ 2

input 3

힘ㄴㄴ

output 3

glass

input 4

아희(Aheui) is esolang which you can program with pure Korean characters.

output 4

dkgml(모뎌ㅑ) ㅑㄴ ㄷ내ㅣ뭏 조ㅑ초 ㅛㅐㅕ ㅊ무 ㅔ갷ㄱ므 쟈소 ㅔㅕㄱㄷ ㅏㅐㄱㄷ무 촘ㄱㅁㅊㅅㄷㄱㄴ.

input 5

dkssud, tprP!

output 5

안녕, 세계!

input 6

ㅗ디ㅣㅐ, 째깅! Hello, World!

output 6

hello, World! ㅗ디ㅣㅐ, 째깅!

Shortest code wins.(in bytes)

New rule for you convenience

You can dismiss characters like A which do not have its counterpart in two-set keyboard. so Aheui to Aㅗ뎌ㅑ is OK. But, if you change Aheui to 모뎌ㅑ, you can get -5 point, so you can earn 5 bytes.

You can separate two jungseongs(like to ㅗ+ㅏ). like rhk to 고ㅏ, or how to ㅗㅐㅈ. But if you combine it(like rhk to or how to ㅙㅈ), you can earn additional -5 points.

LegenDUST

Posted 2019-05-24T11:40:41.017

Reputation: 799

In the jungseong order section one of the letters is missing. I see 21 Korean symbols, but only 20 letter(-pair)s. EDIT: Seems to be missing a trialing l after ml for the Korean symbol . – Kevin Cruijssen – 2019-05-24T12:19:40.740

@KevinCruijssen edited. l for ㅣ. – LegenDUST – 2019-05-24T12:23:39.423

Welcome to PPCG! Your examples seem to indicate this, but I wanted to clarify -- if the character/letter is not in the choseong, jungseong, or jongseong lists (e.g., 2) it should remain untouched? – AdmBorkBork – 2019-05-24T12:28:52.560

@AdmBorkBork As you can see in example 2, 4, and 6, yes. 스택 익스체인지 프로그래밍 퍼즐 & 코드 골프 should converted to tmxor dlrtmcpdlswm vmfhrmfoald vjwmf & zhem rhfvm – LegenDUST – 2019-05-24T12:31:52.720

1Sometimes there can be more than one interpretation. For example, fjfau could be interpreted as 럶ㅕ or 럴며. How do we resolve this? – Nick Kennedy – 2019-05-24T22:48:00.373

It always interperted into 럴며., because it is typing. If I want to type 럶ㅕ, I have to type , than space and backspace, than type . – LegenDUST – 2019-05-25T01:50:53.513

Do jongseong ever occur alone (outside a syllable) in input or output? – lirtosiast – 2019-05-26T21:23:21.037

Why does np in the second test case (input 2) transform from ㅜㅔ to ? Both and are jungseong. Based on your separation paragraph and Korean_code formula I thought characters should only be combined as choseong+jungseong+jongseong or choseong+jungseong+empty. But apparently other combinations are possible as well?.. :S – Kevin Cruijssen – 2019-05-27T07:58:14.730

Some 'jungseong's combines to another. For example : ㅗㅣ to , ㅗㅐ to , ㅜㅔ to , ㅗㅏ to , ㅜㅓ to , ㅜㅣ to . I'll edit question to make it clear... I'm Korean and I thought that is just obvious. – LegenDUST – 2019-05-27T08:17:20.033

1@LegenDUST Well, I can't read a single word Korean, so I'll have to go with your explanation. ;p As for tprP in test case 5: this transforms into ㅅㅔㄱㅖ , where is a choseong, is a jungseong and is a jongseong. So should't this transform into 섷ㅖ (grouped like (ㅅㅔㄱ)(ㅖ)) instead of 세계 (grouped like (ㅅㅔ)(ㄱㅖ))? In an earlier comment you state it is interpret by typing, so I would expect ㅅㅔㄱ to transform into . Or is Korean typing from right to left instead of left to right? – Kevin Cruijssen – 2019-05-27T09:53:53.233

If there is jongseong followed by jungseong, it changed to choseong. tprP changed into ㅅㅔㄱㅖ. Note that choseong is necessary, unless there is nothing. Note that almost every characters in jongseong is same as choseong. So frk(ㄹㄱㅏ) changed to ㄹ가, not ㄺㅏ. Maybe Wikipedia can help. I'll fix problem to make it sure.

– LegenDUST – 2019-05-27T10:00:35.270

This task is perfect for AutoHotkey. – stackzebra – 2019-05-27T12:43:23.053

I have made an editing pass to improve the English. Feel free to roll back if you think I've changed it for the worse. – Veskah – 2019-05-30T13:34:19.187

Do you have a list of all combined Korean characters somewhere? Or do you know the lowest and highest unicode for a range of the combined characters? – Kevin Cruijssen – 2019-05-30T15:20:21.323

@Veskah Thanks a lot. – LegenDUST – 2019-05-31T08:36:19.070

1

@KevinCruijssen PDF file from Unicode.org. AC00() to D7AF().

– LegenDUST – 2019-05-31T08:39:54.863

It looks like IME instead of layout to me. – tsh – 2019-06-03T05:12:47.417

@tsh It is IME, but that doesn't mean it is not layout. Maybe IME is better than layout in this question? – LegenDUST – 2019-06-03T10:23:35.907

Answers

6

Jelly, 296 264 bytes

Ẏœṣjƭƒ
“ȮdȥŒ~ṙ7Ṗ:4Ȧịعʂ ="÷Ƥi-ẓdµ£f§ñỌ¥ẋaḣc~Ṡd1ÄḅQ¥_æ>VÑʠ|⁵Ċ³(Ė8ịẋs|Ṇdɼ⁼:Œẓİ,ḃṙɠX’ṃØẠs2ḟ€”A
“|zƒẉ“®6ẎẈ3°Ɠ“⁸)Ƙ¿’ḃ2’T€ị¢
¢ĖẈṪ$ÞṚƊ€
3£OŻ€3¦ŒpFḟ0Ɗ€J+“Ḥœ’,ƲyO2£OJ+⁽.[,Ʋ¤y¹ỌŒḊ?€µ¢ṖŒpZF€’ḋ588,28+“Ḥþ’Ʋ0;,ʋ/ṚƲ€ñṣ0ḊḢ+®Ṫ¤Ɗ;ṫ®$Ɗ¹Ḋ;⁶Ṫ⁼ṁ@¥¥Ƈ@¢ṪẈṪ‘;Ʋ€¤ḢƲ©?€ṭḢƲF2£żJ+⁽.[Ɗ$ẈṪ$ÞṚ¤ñỌ

Try it online!

A full program that takes a string as its argument and returns a string (which is implicitly printed). This works in three passes: first it converts all Korean characters to lists of code points for the Latin letters. Then it identifies and builds the compound Korean characters. Finally, it turns any remaining stray Latin letters to the Korean equivalent. Note that other characters and Latin letters that don’t appear in the spec (e.g. A) are left alone.

If conversion to lower case of capital letters outside spec is needed, this can be done at a cost of an additional 10 bytes.

Explanation

Helper link 1: dyadic link with arguments x and y. x is a list of pairs of search and replace sublists. y will have each search sublist replaced with the corresponding replace sublist

Ẏ      | Tighten (reduce to a single list of alternating search and replace sublists)
     ƒ | Reduce using y as starting argument and the following link:
    ƭ  | - Alternate between using the following two links:
 œṣ    |   - Split at sublist
   j   |   - Join using sublist

Helper link 2: List of Latin characters/character pairs in the order that corresponds to the Unicode order of the Korean characters

“Ȯ..X’          | Base 250 integer 912...
      ṃØẠ       | Base decompress into Latin letters (A..Za..z)
         s2     | Split into twos
           ḟ€”A | Filter out A from each (used as filler for the single characters)

Helper link 3: Lists of Latin characters used for Choseong, Jungseong and Jongseong

“|...¿’        | List of base 250 integers, [1960852478, 2251799815782398, 2143287262]
       ḃ2      | Convert to bijective base 2
         ’     | Decrease by 1
          T€   | List of indices of true values for each list
            ị¢ | Index into helper link 2

Helper link 4: Above lists of Latin characters enumerated and sorted in decreasing order of length

¢         | Helper link 3 as a nilad
       Ɗ€ | For each list, the following three links as a monad
 Ė        | - Enumerate (i.e. prepend a sequential index starting at 1 to each member of the list)
    $Þ    | - Sort using, as a key, the following two links as a monad
  Ẉ       |   - Lengths of lists
   Ṫ      |   - Tail (this will be the length of the original character or characters)
      Ṛ   | - Reverse

Main link: Monad that takes a Jelly string as its argument and returns the translated Jelly string

Section 1: Convert morphemic blocks to the Unicode codepoints of the corresponding Latin characters

Section 1.1: Get the list of Latin character(s) needed to make the blocks

3£      | Helper link 3 as a nilad (lists of Latin characters used for Choseong, Jungseong and Jongseong)
  O     | Convert to Unicode code points
   Ż€3¦ | Prepend a zero to the third list (Jongseong)

Section 1.2: Create all combinations of these letters (19×21×28 = 11,172 combinations in the appropriate lexical order)

Œp      | Cartesian product
     Ɗ€ | For each combination:
  F     | - Flatten
   ḟ0   | - Filter zero (i.e. combinations with an empty Jonseong)

Section 1.3: Pair the Unicode code points of the blocks with the corresponding list of Latin characters, and use these to translate the morphemic blocks in the input string

       Ʋ   | Following as a monad
J          | - Sequence from 1..11172
 +“Ḥœ’     | - Add 44031
      ,    | - Pair with the blocks themelves
        y  | Translate the following using this pair of lists
         O | - The input string converted to Unicode code points

Section 2: Convert the individual Korean characters in the output from section 1 to the code points of the Latin equivalent

          ¤  | Following as a nilad
2£           | Helper link 2 (list of Latin characters/character pairs in the order that corresponds to the Unicode order of the Korean characters)
  O          | Convert to Unicode code points
         Ʋ   | Following as a monad:
   J         | - Sequence along these (from 1..51)
    +⁽.[     | - Add 12592
        ,    | - Pair with list of Latin characters
           y | Translate the output from section 1 using this mapping

Section 3: Tidy up untranslated characters in the output from section 2 (works because anything translated from Korean will now be in a sublist and so have depth 1)

  ŒḊ?€  | For each member of list if the depth is 1:
¹       | - Keep as is
 Ọ      | Else: convert back from Unicode code points to characters
      µ | Start a new monadic chain using the output from this section as its argument

Section 4: Convert morphemic blocks of Latin characters into Korean

Section 4.1: Get all possible combinations of Choseong and Jungseong

¢    | Helper link 4 (lists of Latin characters enumerated and sorted in decreasing order of length)
 Ṗ   | Discard last list (Jongseong)
  Œp | Cartesian product

Section 4.2: Label each combination with the Unicode code point for the base morphemic block (i.e. with no Jongseong)

                       Ʋ€ | For each Choseong/Jungseong combination
Z                         | - Transpose, so that we now have e.g. [[1,1],["r","k"]]
 F€                       | - Flatten each, joining the strings together
                    ʋ/    | - Reduce using the following as a dyad (effectively using the numbers as left argument and string of Latin characters as right)
                Ʋ         |   - Following links as a monad
   ’                      |     - Decrease by 1
    ḋ588,28               |     - Dot product with 21×28,28
           +“Ḥþ’          |     - Add 44032
                 0;       |     - Prepend zero; used for splitting in section 4.3 before each morphemic block (Ż won’t work because on a single integer it produces a range)
                   ,      |     - Pair with the string of Latin characters
                      Ṛ   |   - Reverse (so we now have e.g. ["rk", 44032]

Section 4.3: Replace these strings of Latin characters in the output from section 3 with the Unicode code points of the base morphemic block

ñ   | Call helper link 1 (effectively search and replace)
 ṣ0 | Split at the zeros introduced in section 4.2

Section 4.4: Identify whether there is a Jongseong as part of each morphemic block

                                        Ʋ | Following as a monad:
Ḋ                                         | - Remove the first sublist (which won’t contain a morphemic block; note this will be restored later)
                                     €    | - For each of the other lists Z returned by the split in section 4.3 (i.e. each will have a morphemic block at the beginning):
                                  Ʋ©?     |   - If the following is true (capturing its value in the register in the process) 
             Ḋ                            |     - Remove first item (i.e. the Unicode code point for the base morphemic block introduced in section 4.3)
              ;⁶                          |     - Append a space (avoids ending up with an empty list if there is nothing after the morphemic block code point)
                                          |       (Output from the above will be referred to as X below)
                                ¤         |       * Following as a nilad (call this Y):
                        ¢                 |         * Helper link 4
                         Ṫ                |         * Jongseong
                              Ʋ€          |         * For each Jongseong Latin list:
                          Ẉ               |           * Lengths of lists
                           Ṫ              |           * Tail (i.e. length of Latin character string)
                            ‘             |           * Increase by 1
                             ;            |           * Prepend this (e.g. [1, 1, "r"]
                     ¥Ƈ@                  |     - Filter Y using X from above and the following criteria
                Ṫ                         |       - Tail (i.e. the Latin characters for the relevant Jongseong
                 ⁼ṁ@¥                     |       - is equal to the beginning of X trimmed to match the relevant Jongseong (or extended but this doesn’t matter since no Jongseong are a double letter)
                                  Ḣ       |       - First matching Jongseong (which since they’re sorted by descending size order will prefer the longer one if there is a matching shorter one)
           Ɗ                              | - Then: do the following as a monad (note this is now using the list Z mentioned much earlier):
      Ɗ                                   |   - Following as a monad
 Ḣ                                        |     - Head (the Unicode code point of the base morphemic block)
  +®Ṫ¤                                    |     - Add the tail of the register (the position of the matched Jongsepng in the list of Jongseong)
       ;                                  |   - Concatenate to:
        ṫ®$                               |     - The rest of the list after removing the Latin characters representing the Jongseong
            ¹                             | - Else: leave the list untouched (no matching Jongseong)
                                       ṭ  | - Prepend:
                                        Ḣ |   - The first sublist from the split that was removed at the beginning of this subsection

Section 5: Handle remaining Latin characters that match Korean ones but are not part of a morphemuc block

F                   | Flatten
                ¤   | Following as a nilad
 2£                 | - Helper link 2 (Latin characters/pairs of characters in Unicode order of corresponding Korean character)
          $         | - Following as a monad
   ż     Ɗ          |   - zip with following as a monad
    J               |     - Sequence along helper link 2 (1..51)
     +⁽.[           |     - Add 12592
             $Þ     | - Sort using following as key
           Ẉ        |   - Lengths of lists
            Ṫ       |   - Tail (i.e. length of Latin string)
               Ṛ    | - Reverse
                 ñ  | Call helper link 1 (search Latin character strings and replace with Korean code points)
                  Ọ | Finally, convert all Unicode code points back to characters and implicitly output

Nick Kennedy

Posted 2019-05-24T11:40:41.017

Reputation: 11 829

1Output is wrong : When I put , I excepted cor, but it gave cBor. And it does not change c to . can had to converted into ㅊ무, but it converted into c무. And I also excepted large characters which don't appear in spec would decapitalized, but it can be fine. – LegenDUST – 2019-05-26T04:57:47.940

@LegenDUST the c problem is fixed. I used A as a placeholder for the second character of single characters, and for some reason the one after c was coming out as a B. Conversion to lower case of other letters could be done, but feels like an unnecessary complication to what’s already a difficult challenge. – Nick Kennedy – 2019-05-26T07:48:41.257

I understand this is hard. So I added new rule : if you decapitalize, you can earn 5 bytes. But this is fine. – LegenDUST – 2019-05-26T07:56:25.187

3

JavaScript (Node.js), 587 582 575 569 557 554 550 549 bytes

tfw you didn't know that string.charCodeAt() == string.charCodeAt(0).

s=>s.replace(eval(`/[ㄱ-힣]|${M="(h[kol]?|n[jpl]?|ml?|[bi-puyOP])"}|([${S="rRseEfaqQtTdwWczxvg"}])(${M}((s[wg]|f[raqtxvg]|qt|[${S}])(?!${M}))?)?/g`,L="r,R,rt,s,sw,sg,e,E,f,fr,fa,fq,ft,fx,fv,fg,a,q,Q,qt,t,T,d,w,W,c,z,x,v,g,k,o,i,O,j,p,u,P,h,hk,ho,hl,y,n,nj,np,nl,n,m,ml,l".split`,`,l=L.filter(x=>!/[EQW]/.test(x)),I="indexOf"),(a,E,A,B,C,D)=>a<"~"?E?X(E):A&&C?F(43193+S[I](A)*588+L[I](C)*28+l[I](D)):X(A)+X(C)+X(D):(b=a.charCodeAt()-44032)<0?L[b+31439]||a:S[b/588|0]+L[30+b/28%21|0]+["",...l][b%28],F=String.fromCharCode,X=n=>n?F(L[I](n)+12593):"")

Try it online!

547 if characters outside alphabets and korean jamos can be ignored.

Okay I struggled for so long to write this, but this should work. No Korean jamo/syllable is used because they are too expensive (3 bytes per use). Used in the regular expression to save bytes.

s=>                                                    // Main Function:
 s.replace(                                            //  Replace all convertible strings:
  eval(
   `/                                                  //   Matching this regex:
    [ㄱ-힣]                                             //   ($0) All Korean jamos and syllables
    |${M="(h[kol]?|n[jpl]?|ml?|[bi-puyOP])"}           //   ($1) Isolated jungseong codes
    |([${S="rRseEfaqQtTdwWczxvg"}])                    //   ($2) Choseong codes (also acts as lookup)
     (                                                 //   ($3) Jungseong and jongseong codes:
      ${M}                                             //   ($4)  Jungseong codes
      (                                                //   ($5)  Jongseong codes:
       (                                               //   ($6)
        s[wg]|f[raqtxvg]|qt                            //          Diagraphs unique to jongseongs
        |[${S}]                                        //          Or jamos usable as choseongs
       ) 
       (?!${M})                                        //         Not linked to the next jungseong
      )?                                               //        Optional to match codes w/o jongseong
     )?                                                //       Optional to match choseong-only codes
   /g`,                                                //   Match all
   L="(...LOOKUP TABLE...)".split`,`,                  //   Lookup table of codes in jamo order
   l=L.filter(x=>!/[EQW]/.test(x)),                    //   Jongseong lookup - only first half is used
   I="indexOf"                                         //   [String|Array].prototype.indexOf
  ),
  (a,E,A,B,C,D)=>                                      //   Using this function:
   a<"~"?                                              //    If the match is code (alphabets):
    E?                                                 //     If isolated jungseongs code:
     X(E)                                              //      Return corresponding jamo
    :A&&C?                                             //     Else if complete syllable code:
     F(43193+S[I](A)*588+L[I](C)*28+l[I](D))           //      Return the corresponding syllable
    :X(A)+X(C)+X(D)                                    //     Else return corresponding jamos joined
   :(b=a.charCodeAt()-44032)<0?                        //    Else if not syllable:
    L[b+31439]||a                                      //     Return code if jamo (if not, ignore)
   :S[b/588|0]+L[30+b/28%21|0]+["",...l][b%28],        //    Else return code for the syllable
  F=String.fromCharCode,                               //   String.fromCharCode
  X=n=>                                                //   Helper function to convert code to jamo
   n?                                                  //    If not undefined:
    F(L[I](n)+12593)                                   //     Return the corresponding jamo
   :""                                                 //    Else return empty string
 )

Shieru Asakoto

Posted 2019-05-24T11:40:41.017

Reputation: 4 445

2

Wolfram Language (Mathematica), 405 401 400 bytes

c=CharacterRange
p=StringReplace
q=StringReverse
r=Reverse
t=Thread
j=Join
a=j[alphabet@"Korean",4520~c~4546]
x=j[#,r/@#]&@t[a->Characters@"rRseEfaqQtTdwWczxvgkoiOjpuPh"~j~StringSplit@"hk ho hl y n nj np nl b m ml l r R rt s sw sg e f fr fa fq ft fx fv fg a q qt t T d w c z x v g"]
y=t[""<>r@#&/@Tuples@TakeList[Insert[a,"",41]~p~x~p~x,{19,21,28}]->44032~c~55203]
f=q@p[q@#,#2]&
g=f[#,r/@y]~p~x~f~y&

Try it online!

Slightly ungolfed

To test this in Mathematica just replace alphabet with Alphabet; however, TIO doesn't support the Wolfram Cloud so I defined Alphabet["Korean"] in the header.

We first decompose all Hangul syllables to the Hangul alphabet, then swap Latin and Hangul characters, then recompose the syllables.

lirtosiast

Posted 2019-05-24T11:40:41.017

Reputation: 20 331

1Test case input 2 results in ㅑㅜㅔㅕㅅ 2 instead of ㅑㅞㅕㅅ 2 in your TIO. Although the same happens in the solution I was working on, since both and are jungseong, and I was under the impression only choseong+jungseong+jongseong or choseong+jungseong+empty would be combined. I asked OP for verification why ㅜㅔ became . – Kevin Cruijssen – 2019-05-27T08:09:25.687

@KevinCruijssen ㅞ (np) is a jungseong in its own right – Nick Kennedy – 2019-05-27T11:39:28.777

1This doesn’t seem to work properly for two character consonants or vowels. For example fnpfa should be a single character but instead ends up as 루ㅔㄹㅁ – Nick Kennedy – 2019-05-28T01:07:01.497

Fix in progress. It shouldn't cost too much. – lirtosiast – 2019-05-28T05:05:20.230

2

Java 19, 1133 1126 1133 bytes

s->{String r="",k="ㄱㄲㄴㄷㄸㄹㅁㅂㅃㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎ ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ ㄱㄲㄳㄴㄵㄶㄷㄹㄺㄻㄼㄽㄾㄿㅀㅁㅂㅄㅅㅆㅇㅈㅊㅋㅌㅍㅎ",K[]=k.split(" "),a="r R s e E f a q Q t T d w W c z x v g k o i O j p u P h hk ho hl y n nj np nl b m ml l r R rt s sw sg e f fr fa fq ft fx fv fg a q qt t T d w c z x v g";var A=java.util.Arrays.asList(a.split(" "));k=k.replace(" ","");int i,z,y,x=44032;for(var c:s.toCharArray())if(c>=x&c<55204){z=(i=c-x)%28;y=(i=(i-z)/28)%21;s=s.replace(c+r,r+K[0].charAt((i-y)/21)+K[1].charAt(y)+(z>0?K[2].charAt(z-1):r));}for(var c:s.split(r))r+=c.charAt(0)<33?c:(i=k.indexOf(c))<0?(i=A.indexOf(c))<0?c:k.charAt(i):A.get(i);for(i=r.length()-1;i-->0;r=z>0?r.substring(0,i)+(char)(K[0].indexOf(r.charAt(i))*588+K[1].indexOf(r.charAt(i+1))*28+((z=K[2].indexOf(r.charAt(i+2)))<0?0:z+1)+x)+r.substring(z<0?i+2:i+3):r)for(z=y=2;y-->0;)z&=K[y].contains(r.charAt(i+y)+"")?2:0;for(var p:"ㅗㅏㅘㅗㅐㅙㅗㅣㅚㅜㅓㅝㅜㅔㅞㅜㅣㅟㅡㅣㅢ".split("(?<=\\G...)"))r=r.replace(p.substring(0,2),p.substring(2));return r;}

Outputs with capital letters ASDFGHJKLZXCVBNM unchanged, since .toLowerCase() costs more than the -5 bonus.

Back +7 bytes as a bug-fix for non-Korean characters above unicode value 20,000 (thanks @NickKennedy for noticing).

Try it online.

Explanation:

s->{                         // Method with String as both parameter and return-type
  String r="",               //  Result-String, starting empty
         k="ㄱㄲㄴㄷㄸㄹㅁㅂㅃㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎ ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ ㄱㄲㄳㄴㄵㄶㄷㄹㄺㄻㄼㄽㄾㄿㅀㅁㅂㅄㅅㅆㅇㅈㅊㅋㅌㅍㅎ",
                             //  String containing the Korean characters
         K[]=k.split(" "),   //  Array containing the three character-categories
         a="r R s e E f a q Q t T d w W c z x v g k o i O j p u P h hk ho hl y n nj np nl b m ml l r R rt s sw sg e f fr fa fq ft fx fv fg a q qt t T d w c z x v g"; 
                             //  String containing the English characters
  var A=java.util.Arrays.asList(a.split(" "));
                             //  List containing the English character-groups
  k=k.replace(" ","");       //  Remove the spaces from the Korean String
  int i,z,y,                 //  Temp integers
      x=44032;               //  Integer for 0xAC00
  for(var c:s.toCharArray()) //  Loop over the characters of the input:
    if(c>=x&c<55204){        //   If the unicode value is in the range [44032,55203]
                             //   (so a Korean combination character):
      z=(i=c-x)%28;          //    Set `i` to this unicode value - 0xAC00,
                             //    And then `z` to `i` modulo-28
      y=(i=(i-z)/28)%21;     //    Then set `i` to `i`-`z` integer divided by 28
                             //    And then `y` to `i` modulo-21
      s=s.replace(c+r,       //    Replace the current non-Korean character with:
        r+K[0].charAt((i-y)/21)
                             //     The corresponding choseong
         +K[1].charAt(y)     //     Appended with jungseong
         +(z>0?K[2].charAt(z-1):r));}
                             //     Appended with jongseong if necessary
  for(var c:s.split(r))      //  Then loop over the characters of the modified String:
    r+=                      //   Append to the result-String:
       c.charAt(0)<33?       //    If the character is a space:
        c                    //     Simply append that space
       :(i=k.indexOf(c))<0?  //    Else-if the character is NOT a Korean character:
         (i=A.indexOf(c))<0? //     If the character is NOT in the English group List:
          c                  //      Simply append that character
         :                   //     Else:
          k.charAt(i)        //      Append the corresponding Korean character
       :                     //    Else:
        A.get(i);            //     Append the corresponding letter
  for(i=r.length()-1;i-->0   //  Then loop `i` in the range (result-length - 2, 0]:
      ;                      //    After every iteration:
       r=z>0?                //     If a group of Korean characters can be merged:
          r.substring(0,i)   //      Leave the leading part of the result unchanged
          +(char)(K[0].indexOf(r.charAt(i))
                             //      Get the index of the first Korean character,
                   *588      //      multiplied by 588
                  +K[1].indexOf(r.charAt(i+1))
                             //      Get the index of the second Korean character,
                   *28       //      multiplied by 28
                  +((z=K[2].indexOf(r.charAt(i+2)))
                             //      Get the index of the third character
                    <0?      //      And if it's a Korean character in the third group:
                      0:z+1) //       Add that index + 1
                  +x         //      And add 0xAC00
                 )           //      Then convert that integer to a character
          +r.substring(z<0?i+2:i+3) 
                             //      Leave the trailing part of the result unchanged as well
         :                   //     Else (these characters cannot be merged)
          r)                 //      Leave the result the same
     for(z=y=2;              //   Reset `z` to 2
         y-->0;)             //   Inner loop `y` in the range (2, 0]:
       z&=                   //    Bitwise-AND `z` with:
         K[y].contains(      //     If the `y`'th Korean group contains
           r.charAt(i+y)+"")?//     the (`i`+`y`)'th character of the result
          2                  //      Bitwise-AND `z` with 2
         :                   //     Else:
          0;                 //      Bitwise-AND `z` with 0
                             //   (If `z` is still 2 after this inner loop, it means
                             //    Korean characters can be merged)
  for(var p:"ㅗㅏㅘㅗㅐㅙㅗㅣㅚㅜㅓㅝㅜㅔㅞㅜㅣㅟㅡㅣㅢ".split("(?<=\\G...)"))
                             //  Loop over these Korean character per chunk of 3:
    r=r.replace(p.substring(0,2),
                             //   Replace the first 2 characters in this chunk
         p.substring(2));    //   With the third one in the result-String
  return r;}                 //  And finally return the result-String

Kevin Cruijssen

Posted 2019-05-24T11:40:41.017

Reputation: 67 575

1they’re from 44032 to 55203. You’ve already got the start location coded. The end is just 44032 + 19×21×28 - 1 – Nick Kennedy – 2019-05-30T15:35:34.137

Works well now. Thought I’d already upvoted you but hadn’t, so here you go! – Nick Kennedy – 2019-05-30T19:15:24.883