Is it a valid consonant cluster in Lojban?

.i xu .e'o lo zunsnagri cu drani loka jboge'a

Given an input of a string consisting of two characters, output whether it is a valid consonant cluster in Lojban.

Here is a quote from CLL 3.6 detailing the rules for a valid consonant cluster pair (or rather, an invalid one):

1) It is forbidden for both consonants to be the same, as this would
   violate the rule against double consonants.

2) It is forbidden for one consonant to be voiced and the other unvoiced.
   The consonants “l”, “m”, “n”, and “r” are exempt from this restriction.
   As a result, “bf” is forbidden, and so is “sd”, but both “fl” and “vl”,
   and both “ls” and “lz”, are permitted.

3) It is forbidden for both consonants to be drawn from the set “c”, “j”,
   “s”, “z”.

4) The specific pairs “cx”, “kx”, “xc”, “xk”, and “mz” are forbidden.

The quote references "voiced" and "unvoiced" consonants. Here is a table of the unvoiced consonants and their voiced counterparts (also from CLL 3.6):

UNVOICED    VOICED
   p          b
   t          d
   k          g
   f          v
   c          j
   s          z
   x          -

Note that {x} has no voiced counterpart. For completeness, the remaining consonants that are not on this list (which can be either voiced or unvoiced for the purposes of the quote) are lmnr. (y is a vowel, and the letters hqw are not used.)

The input must be a single string, but you may assume that it will always consist of exactly two consonants, with optional trailing newline if you wish. The output may be any truthy or falsy value.

This is code-golf, so the shortest code in bytes wins.

Test cases (these are all possible input strings placed in the proper categories):

Valid consonant clusters:
 bd bg bj bl bm bn br bv bz cf ck cl cm cn cp cr ct db dg dj dl dm dn dr dv
 dz fc fk fl fm fn fp fr fs ft fx gb gd gj gl gm gn gr gv gz jb jd jg jl jm
 jn jr jv kc kf kl km kn kp kr ks kt lb lc ld lf lg lj lk lm ln lp lr ls lt
 lv lx lz mb mc md mf mg mj mk ml mn mp mr ms mt mv mx nb nc nd nf ng nj nk
 nl nm np nr ns nt nv nx nz pc pf pk pl pm pn pr ps pt px rb rc rd rf rg rj
 rk rl rm rn rp rs rt rv rx rz sf sk sl sm sn sp sr st sx tc tf tk tl tm tn
 tp tr ts tx vb vd vg vj vl vm vn vr vz xf xl xm xn xp xr xs xt zb zd zg zl
 zm zn zr zv

Invalid consonant clusters:
 bb bc bf bk bp bs bt bx cb cc cd cg cj cs cv cx cz dc dd df dk dp ds dt dx
 fb fd ff fg fj fv fz gc gf gg gk gp gs gt gx jc jf jj jk jp js jt jx jz kb
 kd kg kj kk kv kx kz ll mm mz nn pb pd pg pj pp pv pz rr sb sc sd sg sj ss
 sv sz tb td tg tj tt tv tz vc vf vk vp vs vt vv vx xb xc xd xg xj xk xv xx
 xz zc zf zj zk zp zs zt zx zz

Doorknob

Posted 2016-01-19T23:20:44.827

Reputation: 68 138

Doorknob, this is very close to http://codegolf.stackexchange.com/q/66053/15599 I think about half my code might be reusable.

– Level River St – 2016-01-19T23:27:58.450

@steveverrill Right, I found that question, and I thought this would be sufficiently different given that you only get two characters as input and you don't have to handle vowels and such. – Doorknob – 2016-01-19T23:30:58.397

2@steveverrill ... but now I'm reconsidering, after taking a closer look at the answers. Do you think it would be better if I just left out the initial consonant pair part, and made the challenge simply "is this a valid consonant pair"? – Doorknob – 2016-01-19T23:32:52.113

I think that would both increase the difference between the challenges and simplify this one, both of which would be a good thing. – Level River St – 2016-01-19T23:38:53.350

@steveverrill Yeah, I agree now. Thanks! – Doorknob – 2016-01-19T23:39:34.263

What's "kangri"? – lirtosiast – 2016-01-19T23:42:02.657

@ThomasKwa It's a lujvo of kansa girzu ("accompany-group"), my attempt at a rough translation of "cluster." – Doorknob – 2016-01-19T23:43:24.690

It looks like zunsnagri is the term used for that.

– Lynn – 2016-01-20T01:00:35.767

@Mauris Ah, thanks! (In fact, zunsnagri is simply zunsna kangri, the nonce-term that I made up, with the kansa bit removed.) Someone should get that added to vlasisku. – Doorknob – 2016-01-20T01:02:56.967

Answers

Pyth, 53 48 47 bytes

!}z+"mz"s.pMs[+VGGc"xcxkcsjz"2*"ptkfcsx""bdgvjz

This generates a list of all invalid pairs based on the rules above, then checks if the input is one of them.

! }                        A not in B
    z                      input
    +
      "mz"                  "mz"
      s                    flattened
        .pM                permutations of each:
            s [               flatten the three-element array:
                +V              Alphabet vectorized concat with itself.
                   G            That is, duplicate letters
                   G
                c"xcxkcsjz"2     That string chopped every 2
                *               outer product of
                  "ptkfcsx"      voiced letters
                  "bdgvjz        and unvoiced letters

Try it here.

lirtosiast

Posted 2016-01-19T23:20:44.827

Reputation: 20 331

Retina, 59 57 54 53 52 bytes

(.)\1|[cjsz]{2}|mz

T`fb-jz`svkv
kx|xk|^v?[kpstx]v?

The trailing linefeed is significant. For valid clusters, this outputs a non-empty string; for invalid ones the output is empty.

Try it online! This tests all clusters at once (removing all the invalid ones and leaving all the valid ones intact). To make that possible I've had to replace the ^ anchor with a \b word boundary.

Another solution for the same byte count:

(.)\1|[cjsz]{2}|mz

T`fk-dbz`scv
cx|xc|^v?[cpstx]v?

Explanation

The goal is to remove all invalid pairs completely. We can do with the valid pairs whatever we want as long as at least one character remains.

(.)\1|[cjsz]{2}|mz

This takes care of three rules: (.)\1 matches any pair violating rule 1. [cjsz]{2} matches any pair violating 3. mz matches the specifically disallowed pair from rule 4.

That leaves only rule two and the other specific pairs xk, kx, xc and cx. We can save a bunch of bytes by doing some preprocessing so we have to handle fewer cases:

T`fb-jz`svkv

The idea is to collapse all voiced consonants into one, as well k and c. I'm also turning f into s out of necessity. This is a transliteration stage which will substitute individual characters for other characters. To see the actual mapping we need to expand the range and remember that the last character of the target list is repeated indefinitely:

fbcdefghijz
svkvvvvvvvv

The initial f => s is necessary, because it overrides the later f => v which would turn f into a voiced consonant. We also see that c is turned into k. And all the voiced consontants bdgjz are turned into v. That leaves ehi... luckily these are either vowels or unused in Lojban. The same could also have been achieved with

T`fcb-jz`skv

Alternatively, check out the other solution I posted above that uses a very different transliteration (with a reverse range, and it also turns k into c instead).

Now the remaining invalid combinations can be checked much more easily:

kx|xk|^v?[kpstx]v?

cx and cx have become kx and xk so we only need to check two cases now. For rule 2, we try to match the entire pair, starting from the beginning with an optional voiced consonant (reduced to v), a mandatory unvoiced consonant (where we don't need to check for f and c separately) and another optional voiced consonant. If the pair is a mix of voiced and unvoiced one of the two optional vs will match and the entire pair is removed. Otherwise, this can only match if the pair starts with a voiced consonant (and has anything else second) - in that case only the first character will be removed, and the other one will remain, still giving a truthy result.

Martin Ender

Posted 2016-01-19T23:20:44.827

Reputation: 184 808