14
2
Introduction
It is somewhat like DVORAK Keyboard layout , but MUCH harder.
Let's talk about the Korean keyboard first. As you can see in Wikipedia, there is a Kor/Eng key to change between Korean and English key sets.
Koreans sometime type wrong : they attempt to write in Korean on a qwerty keyboard or in English on a two-set keyboard.
So, here's the problem : if given Korean characters typed in two-set keyboard, convert it to alphabetic characters typed in qwerty keyboard. If given alphabetic characters typed in qwerty, change it to two-set keyboard.
Two-set Keyboard
Here is the two-set keyboard layout :
ㅂㅈㄷㄱㅅㅛㅕㅑㅐㅔ
ㅁㄴㅇㄹㅎㅗㅓㅏㅣ
ㅋㅌㅊㅍㅠㅜㅡ
and with shift key :
ㅃㅉㄸㄲㅆㅛㅕㅑㅒㅖ
just the top row changes while the others do not.
About Korean Characters
if it ended here, it could be easy, but no. When you type
dkssud, tprP!
output is not shown in this way:
ㅇㅏㄴㄴㅕㅇ, ㅅㅔㄱㅖ!
but in this way :
안녕, 세계!(means Hello, World!)
and it makes things much harder.
Korean Characters separate into three parts : 'Choseong(consonant)', 'Jungseong(Vowel)', and 'Jongseong(consonant in the end of syllable : can be blank)', and you have to separate it.
Fortunately, there is way to do that.
How to separate
There are 19 Choseong, 21 Jungseong, and 28 Jongseong(with blank), and 0xAC00 is '가', first character of the Korean characters. Using this, we can separate Korean characters into three parts. Here is the order of each and its position in two-set keyboard.
choseong order :
ㄱㄲㄴㄷㄸㄹㅁㅂㅃㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎ
r R s e E f a q Q t T d w W c z x v g
jungseong order :
ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ
k o i O j p u P h hk ho hl y n nj np nl b m ml l
jongseong order :
()ㄱㄲㄳㄴㄵㄶㄷㄹㄺㄻㄼㄽㄾㄿㅀㅁㅂㅄㅅㅆㅇㅈㅊㅋㅌㅍㅎ
()r R rt s sw sg e f fr fa fq ft fx fv fg a q qt t T d w c z x v g
Let's say (unicode value of some character) - 0xAC00 is Korean_code,
and index of Choseong, Jungseong, Jongseong is Cho, Jung, Jong.
Then, Korean_code is (Cho * 21 * 28) + Jung * 28 + Jong
Here is the javascript code which separate Korean character from this Korean website, for your convenience.
var rCho = [ "ㄱ", "ㄲ", "ㄴ", "ㄷ", "ㄸ", "ㄹ", "ㅁ", "ㅂ", "ㅃ", "ㅅ", "ㅆ", "ㅇ", "ㅈ", "ㅉ", "ㅊ", "ㅋ", "ㅌ", "ㅍ", "ㅎ" ];
var rJung =[ "ㅏ", "ㅐ", "ㅑ", "ㅒ", "ㅓ", "ㅔ", "ㅕ", "ㅖ", "ㅗ", "ㅘ", "ㅙ", "ㅚ", "ㅛ", "ㅜ", "ㅝ", "ㅞ", "ㅟ", "ㅠ", "ㅡ", "ㅢ", "ㅣ" ];
var rJong = [ "", "ㄱ", "ㄲ", "ㄳ", "ㄴ", "ㄵ", "ㄶ", "ㄷ", "ㄹ", "ㄺ", "ㄻ", "ㄼ", "ㄽ", "ㄾ","ㄿ", "ㅀ", "ㅁ", "ㅂ", "ㅄ", "ㅅ", "ㅆ", "ㅇ", "ㅈ", "ㅊ", "ㅋ", "ㅌ", "ㅍ", "ㅎ" ];
var cho, jung, jong;
var sTest = "탱";
var nTmp = sTest.charCodeAt(0) - 0xAC00;
jong = nTmp % 28; // Jeongseong
jung = ((nTmp - jong) / 28 ) % 21 // Jungseong
cho = ( ( (nTmp - jong) / 28 ) - jung ) / 21 // Choseong
alert("Choseong:" + rCho[cho] + "\n" + "Jungseong:" + rJung[jung] + "\n" + "Jongseong:" + rJong[jong]);
When assembled
- Note that
ㅘ,ㅙ,ㅚ,ㅝ,ㅞ,ㅟ,ㅢis a combination of other jungseongs.
ㅗ+ㅏ=ㅘ, ㅗ+ㅐ=ㅙ, ㅗ+ㅣ=ㅚ, ㅜ+ㅓ=ㅝ, ㅜ+ㅔ=ㅞ, ㅜ+ㅣ=ㅟ, ㅡ+ㅣ=ㅢ
- Choseong is necessary. That means, if
frkis given, which isㄹㄱㅏ, it can change in two way :ㄺㅏandㄹ가. Then, you have to convert it into a way which has choseong. Ifjjjrjrgiven, which isㅓㅓㅓㄱㅓㄱ, leadingㅓs don't have anything that can be choseong, but the fourthㅓhasㄱthat can be choseong, so it's changed intoㅓㅓㅓ걱.
Another example : 세계(tprP). It can be changed to 섹ㅖ((ㅅㅔㄱ)(ㅖ)), but because choseong is necessary, it's changed into 세계((ㅅㅔ)(ㄱㅖ))
Examples
input 1
안녕하세요
output 1
dkssudgktpdy
input 2
input 2
output 2
ㅑㅞㅕㅅ 2
input 3
힘ㄴㄴ
output 3
glass
input 4
아희(Aheui) is esolang which you can program with pure Korean characters.
output 4
dkgml(모뎌ㅑ) ㅑㄴ ㄷ내ㅣ뭏 조ㅑ초 ㅛㅐㅕ ㅊ무 ㅔ갷ㄱ므 쟈소 ㅔㅕㄱㄷ ㅏㅐㄱㄷ무 촘ㄱㅁㅊㅅㄷㄱㄴ.
input 5
dkssud, tprP!
output 5
안녕, 세계!
input 6
ㅗ디ㅣㅐ, 째깅! Hello, World!
output 6
hello, World! ㅗ디ㅣㅐ, 째깅!
Shortest code wins.(in bytes)
New rule for you convenience
You can dismiss characters like A which do not have its counterpart in two-set keyboard. so Aheui to Aㅗ뎌ㅑ is OK. But, if you change Aheui to 모뎌ㅑ, you can get -5 point, so you can earn 5 bytes.
You can separate two jungseongs(like ㅘ to ㅗ+ㅏ). like rhk to 고ㅏ, or how to ㅗㅐㅈ. But if you combine it(like rhk to 과 or how to ㅙㅈ), you can earn additional -5 points.
In the jungseong order section one of the letters is missing. I see 21 Korean symbols, but only 20 letter(-pair)s. EDIT: Seems to be missing a trialing
laftermlfor the Korean symbolㅣ. – Kevin Cruijssen – 2019-05-24T12:19:40.740@KevinCruijssen edited. l for ㅣ. – LegenDUST – 2019-05-24T12:23:39.423
Welcome to PPCG! Your examples seem to indicate this, but I wanted to clarify -- if the character/letter is not in the choseong, jungseong, or jongseong lists (e.g.,
2) it should remain untouched? – AdmBorkBork – 2019-05-24T12:28:52.560@AdmBorkBork As you can see in example 2, 4, and 6, yes.
스택 익스체인지 프로그래밍 퍼즐 & 코드 골프should converted totmxor dlrtmcpdlswm vmfhrmfoald vjwmf & zhem rhfvm– LegenDUST – 2019-05-24T12:31:52.7201Sometimes there can be more than one interpretation. For example,
fjfaucould be interpreted as럶ㅕor럴며. How do we resolve this? – Nick Kennedy – 2019-05-24T22:48:00.373It always interperted into
럴며., because it is typing. If I want to type럶ㅕ, I have to type럶, than space and backspace, than typeㅕ. – LegenDUST – 2019-05-25T01:50:53.513Do jongseong ever occur alone (outside a syllable) in input or output? – lirtosiast – 2019-05-26T21:23:21.037
Why does
npin the second test case (input 2) transform fromㅜㅔtoㅞ? Bothㅜandㅔare jungseong. Based on your separation paragraph and Korean_code formula I thought characters should only be combined as choseong+jungseong+jongseong or choseong+jungseong+empty. But apparently other combinations are possible as well?.. :S – Kevin Cruijssen – 2019-05-27T07:58:14.730Some 'jungseong's combines to another. For example :
ㅗㅣtoㅚ,ㅗㅐtoㅙ,ㅜㅔtoㅞ,ㅗㅏtoㅘ,ㅜㅓtoㅝ,ㅜㅣtoㅟ. I'll edit question to make it clear... I'm Korean and I thought that is just obvious. – LegenDUST – 2019-05-27T08:17:20.0331@LegenDUST Well, I can't read a single word Korean, so I'll have to go with your explanation. ;p As for
tprPin test case 5: this transforms intoㅅㅔㄱㅖ, whereㅅis a choseong,ㅔis a jungseong andㄱis a jongseong. So should't this transform into섷ㅖ(grouped like(ㅅㅔㄱ)(ㅖ)) instead of세계(grouped like(ㅅㅔ)(ㄱㅖ))? In an earlier comment you state it is interpret by typing, so I would expectㅅㅔㄱto transform into섷. Or is Korean typing from right to left instead of left to right? – Kevin Cruijssen – 2019-05-27T09:53:53.233If there is jongseong followed by jungseong, it changed to choseong.
– LegenDUST – 2019-05-27T10:00:35.270tprPchanged intoㅅㅔㄱㅖ. Note that choseong is necessary, unless there is nothing. Note that almost every characters in jongseong is same as choseong. Sofrk(ㄹㄱㅏ) changed toㄹ가, notㄺㅏ. Maybe Wikipedia can help. I'll fix problem to make it sure.This task is perfect for AutoHotkey. – stackzebra – 2019-05-27T12:43:23.053
I have made an editing pass to improve the English. Feel free to roll back if you think I've changed it for the worse. – Veskah – 2019-05-30T13:34:19.187
Do you have a list of all combined Korean characters somewhere? Or do you know the lowest and highest unicode for a range of the combined characters? – Kevin Cruijssen – 2019-05-30T15:20:21.323
@Veskah Thanks a lot. – LegenDUST – 2019-05-31T08:36:19.070
1
@KevinCruijssen PDF file from Unicode.org. AC00(
– LegenDUST – 2019-05-31T08:39:54.863가) to D7AF(힣).It looks like IME instead of layout to me. – tsh – 2019-06-03T05:12:47.417
@tsh It is IME, but that doesn't mean it is not layout. Maybe IME is better than layout in this question? – LegenDUST – 2019-06-03T10:23:35.907