Check if a given string is valid romaji

7

Your program is given a string consisting entirely of lowercase letters at STDIN (or closest alternative). The program must then output a truthy or falsey value, depending on whether the input is valid romaji.

Rules:

  • It must be possible to divide the entire string into a sequence of kana without any leftover characters.
  • Each kana can be a single vowel (aeiou)
  • Each kana can also be a consonant p, g, z, b, d, k, s, t, n, h, m, or r followed by a vowel. For example, ka and te are valid kana, but qa is not.
  • The exceptions to the above rule are that zi, di, du, si, ti, and tu are not valid kana.
  • The following are also valid kana: n, wa, wo, ya, yu, yo, ji, vu, fu, chi, shi, tsu.
  • If a particular consonant is valid before an i (i.e ki, pi), the i can be replaced by a ya, yu, or yo and still be valid (i.e kya, kyu, kyo)
  • Exceptions to the above rule are chi and shi, for which the y has to be dropped too (i.e cha, chu, cho, sha, shu, sho)
  • It is also valid to double consonants if they are the first character of a kana (kka is valid but chhi is not)
  • Shortest answer wins. All regular loopholes are disallowed.

List of all valid kana:

Can have double consonant:

ba, bu, be, bo, bi
ga, gu, ge, go, gi
ha, hu, he, ho, hi
ka, ku, ke, ko, ki
ma, mu, me, mo, mi
na, nu, ne, no, ni
pa, pu, pe, po, pi
ra, ru, re, ro, ri
sa, su, se, so,
za, zu, ze, zo,
da,     de, do,
ta,     te, to,
wa,         wo,
ya, yu,     yo,
    fu,
    vu
                ji

Can not have double consonant:

a, i, u, e, o, 
    tsu,
chi, cha, cho, chu,
shi, sha, sho, shu,
n

Test cases

Pass:

kyoto
watashi
tsunami
bunpu
yappari

Fail:

yi
chhi
zhi
kyi

takra

Posted 2017-05-24T00:08:26.497

Reputation: 793

How do we win? Is this a code golf? – caird coinheringaahing – 2017-05-24T00:09:37.357

2Need test cases. Also could do with a list of all valid kana instead of the rules – Robert Fraser – 2017-05-24T00:11:05.033

1@RobertFraser both is preferred - test cases are not rules – Stephen – 2017-05-24T00:12:18.673

If pa pi pu pe po are valid kana, then can't you simply group p with the other consonants? – Value Ink – 2017-05-24T00:15:59.960

I added all the kana, I think. Please tell me if I missed some. – takra – 2017-05-24T00:20:06.643

@ValueInk good point, that makes it easier to understand – takra – 2017-05-24T00:20:29.103

"If a particular consonant is valid before an i (i.e ki, pi), the i can be replaced by a ya, yu, or yo and still be valid (i.e kya, kyu, kyo)" The list seems to be missing these for "h", and "p" – Jonathan Allan – 2017-05-24T02:32:38.590

The list is missing all doubled consonant variants. Is chchi valid or not (i.e. is ch considered a character)? – Jonathan Allan – 2017-05-24T02:38:16.780

I've updated list of valid kana, but n needs clarification – Dead Possum – 2017-05-24T14:08:45.197

2n cannot be doubled. I know enough about the Japanese alphabets to say that. If n was doubled, it would need to have a vowel after, but then it wouldn't be n. So if kanna was a word (just making it up), it'd actually be ka n na. – mbomb007 – 2017-05-24T14:16:40.737

1

You know, I wanted to make a solution using unicodedata, but it'll definitely be longer than a regex solution. Partial program

– mbomb007 – 2017-05-24T14:30:48.697

@mbomb007 Does it output vowels twice intended? – Dead Possum – 2017-05-24T14:34:49.287

@DeadPossum This isn't a solution. If I were going to use it, I'd be checking if the list contains something. – mbomb007 – 2017-05-24T14:35:31.650

Answers

2

Ruby, 96 149 bytes

Regex solution to match all the valid kana. Interestingly, "ecchi" is not valid according to the current rules, but perhaps it's for the best.

->s{s.gsub(/(?![dt]u)(sh|ch|([gbknhmrp])\2?y?|([zdst])\3?)?[auo]|(\g<2>)?\4?[ie]|(\g<3>)\5?e|ww?[ao]|n|tsu|([fv])\6?u|jj?i|j?y?[aou]|yy[aou]/){}==""}

Try it online! feat. Cruel Angel's Thesis

Value Ink

Posted 2017-05-24T00:08:26.497

Reputation: 10 608

It failes on simple tests zi and zye – Dead Possum – 2017-05-24T13:32:21.043

@DeadPossum fixed. – Value Ink – 2017-05-24T23:01:13.097

1

Python 2, 166 bytes

Long regex solution
Try it online

I think that f-strings from 3.[something] python can help to shorten it by replacing repeated [auo and {1,2}.
Unfortunatetly I can't check it by myself now :c

import re
lambda x:re.sub('[bghkmnpr]~([auoei]|y[auo])|[sz]~[auoe]|[dt]~[aeo]|w~[ao]|([fv]~|ts)u|(j~|[cs]h)(i|y[auo])|y~[auo]|[auoien]'.replace('~','{1,2}'),'',x)==''

Dead Possum

Posted 2017-05-24T00:08:26.497

Reputation: 3 256

re.sub('~','{1,2}',(your regex) is shorter than (your regex).replace('~','{1,2}') by 1 byte. – Value Ink – 2017-05-24T23:03:18.717

Your regex is also failing on a simple test case: bku. Doubled consonants have to be the same consonant. – Value Ink – 2017-05-24T23:04:37.547