Pinyin Combinations

13

1

Create a function that takes a string of a pinyin syllable as the argument, and returns true of the combination exists, false otherwise.

Use "v" for "ü".

Here is a full list of combinations. http://www.pinyin.info/rules/initials_finals.html

Examples

f("bu") == true
f("zheng") == true
f("nv") == true
f("ri") == true
f("cei") == false
f("ia") == false
f("kian") == false
f("qa") == false

Please, don't do things like scraping webpages or reading input method files to reduce character count. (If you do, the length of the data will be counted toward character count) One of the purposes of this code golf is to see how rules can be simplified. Shortest code wins.

Ming-Tang

Posted 2011-06-20T18:22:40.277

Reputation: 5 383

What about something like nar? :P – JiminP – 2011-06-21T00:13:49.143

1Just as a note, despite what the examples say, I don't believe nvi is ever a valid combination. – rintaun – 2011-06-21T00:28:36.067

If the linked page already says » er has been omitted from this table« shouldn't it be included as well? (After all, it was a number, if I remember correctly ;-)) – Joey – 2011-06-22T10:23:01.300

Answers

4

JavaScript 1.6, 503 496 477 characters

function g(s){return/^([bfmpw]?o|[yjqx]ua?n|[ln]ve?|ei?|y[aio]ng|w?[ae]ng?|w?ai?|wei|y?ao|y?ou|y[ai]n?|yu?e|[^aeiou]+u)$/.test(s)|(((k=6*("ccsszzdflmnprtbghkjqx".indexOf(s[0])+(f=s[1]=='h')))|(r="a.e.ai.ei.ao.ou.an.ang.en.eng.ong.ua.uo.uai.ui.uan.uang.un.i.ia.ie.iao.iu.ian.iang.in.ing.iong.u.ue".split('.').indexOf(s.slice(f+1))))<0?0:k>84?r>17^k<108:parseInt("009m2f00b8jb009m2f00b7r3009m2n00b8jj1dwcfz0000rtfjba4f1xgbnjfj01rz1uyfb1009nn61b37cv1uyfa5".slice(k,k+6),36)>>r&1)}

Fomatted a little bit more readably (barring any errors in breaking the code into a few lines):

function _g(s)
{
  f = s[1] == 'h'
  k = "ccsszzdfghjklmnpqrtxb".indexOf(s[0]) * 6
  k += 6 * f
  return /^(weng|[bfmp]?o|[yjqx]ua?n|[ln]ve?|[ae]i?|y[aeiu]|y[aio]ng|[ae]ng?|wang?|wai?|we[in]|w[ou]|y?ao|y?ou?|y[ai]n|yue)$/.test(s) | 
         !!(k >= 0 && (1 << "a.e.ai.ei.ao.ou.an.ang.en.eng.ong.u.ua.uo.uai.ui.uan.uang.un.i.ia.ie.iao.iu.ian.iang.in.ing.iong.u.ue".split('.').indexOf(s.slice(f + 1)) & parseInt("00j85300mh2v00j85300mgan00j85b00mh332rsovz0002cp00b8jj00b8jjqmlts000b8jjv2mkfz3uwo3jv203jz3pwvelqmlts000jbaq2m6ewvqmlts03pwvdp".slice(k, k + 6), 36)))
}

The zero-initial cases plus a few one-offs are tested with a regular expression. After that, the table is encoded as a (concatenated) series of 6-digit, base-36 numbers, one per initial sound. The lookup then uses a pair of indexOf calls and a shift to select the right bit.

Tested against all cells in the table of combinations (filled cells tested for true, empty cells tested for false).

Edit: Replaced some of the 36 chars of the base-36 lookup with comparisons since g–, k–, h–, j–, q–, and z– have dense blocks of true/false.

Edit: Rearranged the bit test to avoid an unnecessary !! and compacted the regex more.

DocMax

Posted 2011-06-20T18:22:40.277

Reputation: 704

Why do you need a !!? I'm not sure I understand why you would ever need a double not... – Peter Olson – 2011-06-22T16:27:56.390

With it, the return is 0 or 1; without it "true" is returned as non-zero but not necessarily 1. My test script is validating with if (g(s) == (validList.indexOf(s) >= 0) which returns false on 16 == true; I debated it from a "what does 'true' really mean" perspective and left the thing in. In either case, I have a planned change for later today that will do away with the !! by replacing 1<<r&*parseInt with (more or less) (parseInt>>r)&1 so that the return is 1 and I shave off two chars. – DocMax – 2011-06-22T18:40:35.693

1

APL (Dyalog Extended), 475 bytes

s←⊢⊆⍨' '≠⊢
a b c←2097144 131064 1957895
f←{(⊂⍵)∊(12↑v),(s'yi ya ye yao you yan yang yin ying yong yu yue yuan yun wu wa wo wai wei wan wang wen weng nv lv nve lve'),(,⊤(a-8)1966080 393208 1966064 2096720 1966072 1048568a a 2056184a 131048a 7288b 7280 106488b 7280b 0 1958911 73735c c 352263c 24583 1859591c,5⍴7)/,('bpmfdtnlgkhzcs',s'zh ch sh r j q x')∘.,v←'aoe',s'ai ei ao ou an ang en eng ong u ua uo uai ui uan uang un ueng i ia ie iao iu ian iang in ing iong u ue uan un'}

Try it online!

Golfing in progress.

Ungolfed

s←{⍵⊆⍨' '≠⍵}
con←s'b p m f d t n l g k h z c s zh ch sh r j q x'
vwl←s'a o e ai ei ao ou an ang en eng ong u ua uo uai ui uan uang un ueng i ia ie iao iu ian iang in ing iong u ue uan un'
tab←con∘.,vwl
bin←,⊤2097136 1966080 393208 1966064 2096720 1966072 1048568 2097144 2097144 2056184 2097144 131048 2097144 7288 131064 7280 106488 131064 7280 131064 0 1958911 73735 1957895 1957895 352263 1957895 24583 1859591 1957895 7 7 7 7 7
all←'aoe',(12↑vwl),(s'yi ya ye yao you yan yang yin ying yong yu yue yuan yun wu wa wo wai wei wan wang wen weng nv lv nve lve'),bin/,tab
f←{(⊂⍵)∊all}

Try it online!

The helper function s unpacks a space-delimited string:

{⍵⊆⍨' '≠⍵}    monadic function taking a string
    ' '≠⍵       0s at spaces, 1s elsewhere
 ⍵⊆⍨            Partition (split at 0s)

I first store the possible initial and final strings in the syllable, then make a table tab containing the concatenation of each string from the first list with each string from the second list.

Next, I store binary data as a list of integers. Some of the integers are repeated and can therefore be stored in variables, which also allows elision of some spaces.

Each integer is decoded into binary, and represents one row of the table. Each bit in the number represents whether a certain syllable in that row is a valid syllable, with the MSB representing the first column. All invalid syllables are removed from the table.

We flatten the table into a list, add on the forms with no initial consonant as a special case, and finally check if our input is in the list.

Possible further golfing potential:

  • Write base64 or base255 encoding
  • Reorder the columns and rows to make the numbers smaller.

Python helpful script and test-case generator: Try it online!

lirtosiast

Posted 2011-06-20T18:22:40.277

Reputation: 20 331

1

PHP, 548 characters

Granted, it's likely not optimal, but I wrote a regex to match valid pinyin combinations. Reduced characters by replacing repeating substrings with variables.

Code

<?php $a='?|e(i|ng?)';$b='|o(u|ng)|u';$c='|a?n)?|i(a[on]';$d='(a(ng?|o|i)';$e='|ng?)';$f='(i|ng)?';echo(preg_match("/^([bpm](a(i|o$e$a|u|o|i(e|a[on]$e?)|[pm]ou|m(e|iu)|f(a(ng?)?|ou$a|u)|d$d$a?$b(o|i$c?|e|u)?)|[dtnl]$d?|e$f$b(o$c|e)?)|[jqxy](i(a(o$e?|e|u|o?ng|n)|u(e|a?n))|([zcs]h?|r)i|[nl](ve?|i(n|ang?|u))|[dl]ia|[dt](ing|ui)|[dn]en|diu|([gkh]|[zcs]h?)(e(ng?)|a(o|ng?|i)?|ou|u(o|i|a?n)?)|r(e(ng?)?|a(o$e$b(a?n?|o|i)?)|[gkh](ei|ong|u(a$f))|[zcs]hua$f|([zcs]|[zc]h)ong|(z|[zs]h)ei|a(i|o$e?|ou$a?|w(u|a(i$e?|o|e(i$e))$/",$argv[1]))?"true":"false";

Usage

> php pinyin.php bu
> true
> php pinyin.php cei
> false

rintaun

Posted 2011-06-20T18:22:40.277

Reputation: 751

1

F#, 681 characters

type l=Y|J|Q|X|W|F|B|P|M|N|L|T|D|Z|K|H|Zh|G|Sh|Ch|C|S|R|Iong|Vn|Van|Ia|Iu|In|Iang|Ve|V|Ian|Iao|Ie|Ing|I|Ei|A|Ai|An|Ang|Eng|U|Ao|E|Ou|Uo|Uan|Un|Ui|En|Ong|Ua|Uang|Uai|Ueng|O
let v x=x.GetHashCode()
let n x=J.GetType().GetNestedType("Tags").GetFields().GetValue(v x).ToString().Substring(6).ToLower();
let(^)a b=List.collect(fun x->List.map(fun z-> n x+ n z)b)a
let(-)a b=[v a..v b]
let(&)a b=a@b
let(!)a=[v a]
[<EntryPoint>]
let main a=
 printf"%b"(List.exists(fun x->x=a.[0])(Y-X^Iong-I& !W^Ei-Ui@Ua-O& !F^Ei-A@An-U@ !Ou&(F-N@D-Sh)^ !En&F-M^ !O&B-M^ !In&N-L^Iu-Un& !D^Ia-Iu&B-D^Ian-Ao& !M^E-Ou&Ch-S^A-Ong&T-Sh^Ei-Ui&N-G^ !Ong&K-Ch^Ua-Uai& !R^An-Ua&(Sh-R@ !Z@ !Zh)^ !I&["lia";"pou";"mui"]))
 0

Doesn't quite get the syllables without initial consonant correct (Y, W etc).

Mark H

Posted 2011-06-20T18:22:40.277

Reputation: 111