¿xu ti te gismytermorna? (Is it a valid gismu?)

25

2

(Literally: "Does this follow/realize the gismu-form?")

Premise

The language Lojban is a constructed language, meaning in part that all of its words have been created rather than allowed to develop naturally. The semantic base of Lojban are its gismu, or root words, which were synthesized by combining roots from widely spoken natural languages like Chinese, Hindi, and English. All gismu are 5 letters long and follow a certain strict form.

Information

For our purposes, the Lojban alphabet is:

abcdefgijklmnoprstuvxz

That is, the Roman alphabet without hqwy.

This alphabet can be divided into four categories:

  • Vowels aeiou

  • Sonorant consonants lmnr

  • Unvoiced consonants ptkfcsx. When voiced, these become respectively the...

  • Voiced consonants bdgvjz (No voiced consonant corresponds to x.)

To be a valid gismu, a 5-char-long string must:

  1. Be in one of the consonant-vowel patterns CVCCV or CCVCV, where C represents a consonant, and V represents a vowel.

  2. Follow consonant-matching rules.

Consonant-matching rules for CCVCV words:

The first two characters must constitute one of the following 48 pairs (source):

ml mr
pl pr
bl br
   tr                   tc ts
   dr                   dj dz
kl kr
gl gr
fl fr
vl vr
cl cr cm cn cp ct ck cf
      jm    jb jd jg jv
sl sr sm sn sp st sk sf
      zm    zb zd zg zv
xl xr

Note that this looks rather nicer when separated into voiced and unvoiced pairs. In particular, every voiced-voiced pair is valid iff the corresponding unvoiced-unvoiced pair is valid. This does not extend to pairs with a sonorant consonant; cl is valid but jl is not.

Consonant-matching rules for CVCCV words (source):

The third and fourth characters must follow the following rules:

  1. It is forbidden for both consonants to be the same [...]

  2. It is forbidden for one consonant to be voiced and the other unvoiced. The consonants “l”, “m”, “n”, and “r” are exempt from this restriction. As a result, “bf” is forbidden, and so is “sd”, but both “fl” and “vl”, and both “ls” and “lz”, are permitted.

  3. It is forbidden for both consonants to be drawn from the set “c”, “j”, “s”, “z”.

  4. The specific pairs “cx”, “kx”, “xc”, “xk”, and “mz” are forbidden.

Note that there are 179 possible pairs.

Challenge

Determine if the given string follows the gismu formation rules. This is , so the shortest solution in bytes wins.

Input: A string of length 5 from the Lojban alphabet.

Output: A truthy value if the string can be a gismu and a falsey value otherwise.

Test cases

Valid:

gismu
cfipu
ranxi
mupno
rimge
zosxa

Invalid:

ejram
xitot
dtpno
rcare
pxuja
cetvu

More test cases: this text file contains all valid gismu, one per line.

I don't really know Lojban, so I suspect the title translation is wrong. Help is appreciated.

lirtosiast

Posted 2015-12-08T18:53:56.760

Reputation: 20 331

8Note that Lojban pronunciation is phonetic, so gismu is pronounced with a hard g, like in GIF. – lirtosiast – 2015-12-08T20:27:21.837

12I don't know if that's a good example, because the official pronunciation of GIF is like Jiff. :p – geokavel – 2015-12-08T21:16:05.910

Side question: Since both s and k are part of the language, what pronunciation does c has? – Fatalize – 2015-12-09T11:07:26.397

@Fatalize the unvoiced version of j would be ch in English, so I assume it is that. The letter C is used for the CH sound in Malay, so the words "capati" and "antiseptik" sound exactly like their English equivalents "chapati" and "antiseptic". Source: holidays in Malaysia – Level River St – 2015-12-09T11:12:44.107

2@Fatalize: It's "sh". – Deusovi – 2015-12-09T11:43:58.013

1@Deusovi it seems you are right. The reason I got it wrong is because j is not pronounced as English J, but rather as French J (without the plosive at the beginning.) From one of the linked pages The regular English pronunciation of “James”, which is [dʒɛjmz], would Lojbanize as “djeimz.”, which contains a forbidden consonant pair......[additional rule to avoid this] so we see that the plosive D needs to be added in. The unvoiced version of French J is indeed SH. The IPA symbols (for those who understand them) are on the wikipedia page. – Level River St – 2015-12-09T12:12:21.423

tag kolmogorow-complexity? – Leif Willerts – 2015-12-18T13:22:49.020

I think a better translation for the title might be just .i xu gismu, or if you want to preserve the "valid" part maybe .i xu drani loka gismu [kei fo lotu'a jboge'a]. – Doorknob – 2015-12-28T20:50:44.743

@Doorknob冰 I think "gismu" implies that it's one of the actual words that have been assigned meaning—but let me do some more research. – lirtosiast – 2015-12-28T20:58:02.767

Ah, right, I didn't consider that. You're probably right (I'm still only a learner). .ui – Doorknob – 2015-12-28T21:02:06.737

Answers

7

Ruby, 302 252 bytes

c='[cjsztdpbfvkgxmnlr]'
v=c+'[aeiou]'
z=!1
n=/#{c+v+v}/=~(s=gets.chop)*2
(n==0||n==2)&&289.times{|k|q=[i=k%17,j=k/17].max
z||=(x=s[n,2])==c[j+1]+c[i+1]&&['UeUeJOJOJOJOETJ
:'[i].ord-69>>j&1-j/14>0,i!=j&&q>3&&(k%2<1||q>12)&&!'mzxcxkx'.index(x)][n/2]}
p z

A few more bytes could be saved as follows:

Initialize z to false using z=!c='[cjsztdpbfvkgxmnlr]'. This works but gives the warning warning: found = in conditional, should be ==.

Change from a program to a function (I left it as a program because according to the question, shortest "program" in bytes wins.)

Summary of changes from first post

Major overhaul of regex/matching part.

Constant 72 changed to 69 so that the lowest ASCII code in the magic string is 10 instead of 13. This enables a literal newline to be used in the golfed version instead of an escape sequence.

Magic string 'mzxcxkx' replaces arithmetic rules for the 5 prohibited characters in the CVCCV type table.

ungolfed version

added whitespace and changed newline in magic string to a \n

c='[cjsztdpbfvkgxmnlr]'                                   #c=consonants
v=c+'[aeiou]'                                             #v=consonant+vowel
z=!1                                                      #Set z to false (everything is truthy in Ruby except nil and false.)
n=/#{c+v+v}/=~(s=gets.chop)*2                             #Get input and duplicate it. do regex match, n becomes the index of the double consonant. 
(n==0||n==2)&&                                            #If n==0 (ccvcv) or n==2 (cvccv) 
   289.times{|k|                                          #iterate 17*17 times
     q=[i=k%17,j=k/17].max                                #generate row and column, find their maximum.
     z||=                                                 #OR z with the following expression:
     (x=s[n,2])==c[j+1]+c[i+1]&&                          #double consonant == the pair corresponding to j,i AND either 
       ["UeUeJOJOJOJOETJ\n:"[i].ord-69>>j&1-j/14>0,       #this expression or
       i!=j&&q>3&&(k%2<1||q>12)&&!'mzxcxkx'.index(x)][n/2]#this expresson, depending on the value of n/2
   }
p z                                                       #print output

Explanation of matching

The two characters in the input string s[n,2]are compared with the character pair of the iterating loop. If they match and the consonant-vowel regex pattern is correct, the row and column values i,j are checked for validity. Careful ordering of the consonants helps here.

For CVCCV:

i!=j                        It is forbidden for both consonants to be the same
(k%2<1||q>12)               It is forbidden for one consonant to be voiced and the other unvoiced. The consonants “l”, “m”, “n”, and “r” are exempt from this restriction. As a result, “bf” is forbidden, and so is “sd”, but both “fl” and “vl”, and both “ls” and “lz”, are permitted.
q>3                         It is forbidden for both consonants to be drawn from the set “c”, “j”, “s”, “z”.
!'mzxcxkx'.index(x)         The specific pairs “cx”, “kx”, “xc”, “xk”, and “mz” are forbidden.

For CCVCV

A bitmap for each column of the table below is encoded into the magic string, from which 69 is subtracted. For all columns except the last two, only 6 bits are required. For the last two, the higher order bits need to be 1, so a negative number is generated (characters \n and :) in order to have leading 1's instead of leading zeroes. We don't want to include the last three rows of the table though, so instead of rightshift and ANDing by 1, we rightshift and AND by 1-j/14 which normally evaluates to 1, but evaluates to 0 for the last 3 rows.

The following program (with the same expressions as the submission) was used to generate the tables below (uncomment whichever if line is required for the table you want.

c='[cjsztdpbfvkgxmnlr]'
z=0
289.times{|k|
  q=[i=k%17,j=k/17].max
  r=c[j+1]+c[i+1]
  #if i!=j && q>3 && (k%2<1||q>12) && !'mzxcxkx'.index(r)
  #if "UeUeJOJOJOJOETJ\n:"[i].ord-69>>j&1-j/14>0
    print r,' '
    z+=1
  else
    print '   '
  end
  i==16&&puts 
}
puts z


            ct    cp    cf    ck       cm cn cl cr
               jd    jb    jv    jg    jm jn jl jr
            st    sp    sf    sk    sx sm sn sl sr
               zd    zb    zv    zg    zm zn zl zr
tc    ts          tp    tf    tk    tx tm tn tl tr
   dj    dz          db    dv    dg    dm dn dl dr
pc    ps    pt          pf    pk    px pm pn pl pr
   bj    bz    bd          bv    bg    bm bn bl br
fc    fs    ft    fp          fk    fx fm fn fl fr
   vj    vz    vd    vb          vg    vm vn vl vr
kc    ks    kt    kp    kf             km kn kl kr
   gj    gz    gd    gb    gv          gm gn gl gr
      xs    xt    xp    xf             xm xn xl xr
mc mj ms    mt md mp mb mf mv mk mg mx    mn ml mr
nc nj ns nz nt nd np nb nf nv nk ng nx nm    nl nr
lc lj ls lz lt ld lp lb lf lv lk lg lx lm ln    lr
rc rj rs rz rt rd rp rb rf rv rk rg rx rm rn rl 
179

            ct    cp    cf    ck       cm cn cl cr
               jd    jb    jv    jg    jm
            st    sp    sf    sk       sm sn sl sr
               zd    zb    zv    zg    zm
tc    ts                                        tr
   dj    dz                                     dr
                                             pl pr
                                             bl br
                                             fl fr
                                             vl vr
                                             kl kr
                                             gl gr
                                             xl xr
                                             ml mr


48

Level River St

Posted 2015-12-08T18:53:56.760

Reputation: 22 049

I changed the wording to allow functions; sorry it took so long. – lirtosiast – 2015-12-18T01:15:28.660

6

JavaScript (ES6), 366 352 bytes

g=>((q=3,w=2,r=0,f="mzcscjzjxcxkx",c="bdgvjzptkfcsxlmnr",d=[...c],v="aeiou")[m="match"](g[1])?d.map((a,i)=>d.map((b,j)=>a==b|(i<6&j>5&j<13|j<6&i>5&i<13)||f[m](a+b)||(p+=","+a+b)),p="",q=0,r=w--)&&p:"jbl,zbr,tstcl,cmr,cn,cr,jdr,cfl,sfr,jgl,zgr,zdjml,ckl,skr,cpl,spr,sl,sm,sn,sr,ctr,jvl,zvr,xl,xr,dzm")[m](g[r]+g[r+1])&&c[m](g[q])&&v[m](g[w])&&v[m](g[4])

Explanation

Returns an array containing the last letter (truthy) if it is a valid gismu or null if it is not.

A lot of the size comes from the hard-coded CCVCV pairs (even after condensing them). It might be possible to find a pattern to generate them but I've spent way too much time on this already! xD

g=>
  (
    // Save the positions to check for the consonant, vowel and pair respectively
    (q=3,w=2,r=0,                       // default = CCVCV format
    f="mzcscjzjxcxkx",                  // f = all forbidden pairs for CVCCV pairs
    c="bdgvjzptkfcsxlmnr",              // c = consonants
    d=[...c],                           // d = array of consonants
    v="aeiou")                          // v = vowels
    [m="match"](g[1])?                  // if the second character is a vowel

      // Generate CC pairs of CVCCV
      d.map((a,i)=>                     // iterate over every possible pair of consonants
        d.map((b,j)=>
          a==b|                         // rule 1: consonants cannot be the same
          (i<6&j>5&j<13|j<6&i>5&i<13)|| // rule 2: pair cannot be voiced and unvoiced
          f[m](a+b)||                   // rule 3 & 4: certain pairs are forbidden
            (p+=","+a+b)                // if it follows all the rules add the pair
        ),
        p="",                           // p = comma-delimited valid CVCCV pairs
        q=0,r=w--                       // update the match positions to CVCCV format
      )&&p
    :
      // CC pairs of CCVCV (condensed so that valid pairs like "jb", "bl" and
      //     "zb" can be matched in this string but invalid pairs like "lz" cannot)
      "jbl,zbr,tstcl,cmr,cn,cr,jdr,cfl,sfr,jgl,zgr,zdjml,ckl,skr,cpl,spr,sl,sm,sn,sr,ctr,jvl,zvr,xl,xr,dzm"

  // Match the required format
  )[m](g[r]+g[r+1])&&c[m](g[q])&&v[m](g[w])&&v[m](g[4])

Test

var solution = g=>((q=3,w=2,r=0,f="mzcscjzjxcxkx",c="bdgvjzptkfcsxlmnr",d=[...c],v="aeiou")[m="match"](g[1])?d.map((a,i)=>d.map((b,j)=>a==b|(i<6&j>5&j<13|j<6&i>5&i<13)||f[m](a+b)||(p+=","+a+b)),p="",q=0,r=w--)&&p:"jbl,zbr,tstcl,cmr,cn,cr,jdr,cfl,sfr,jgl,zgr,zdjml,ckl,skr,cpl,spr,sl,sm,sn,sr,ctr,jvl,zvr,xl,xr,dzm")[m](g[r]+g[r+1])&&c[m](g[q])&&v[m](g[w])&&v[m](g[4])
<input type="text" id="input" value="gismu" />
<button onclick="result.textContent=solution(input.value)">Go</button>
<pre id="result"></pre>

user81655

Posted 2015-12-08T18:53:56.760

Reputation: 10 181

0

Javascript ES6, 240 bytes

x=>eval(`/${(c='[bcdfgjklmnprstvxz]')+c+(v='[aeiou]')+c+v}/${t='.test(x)'}?/^[bfgkmpvx][lr]|[cs][fklmnprt]|d[jrz]|[jz][bdgmv]/${t}:/${c+v+c+c+v}/${t}?!/^..((.)\\2|${V='[bdgvjz]'}${U='[ptkfcsx]'}|${U+V}|[cjsz][cjsz]|cx|kx|xc|xk|mz)/${t}:!1`)

I guess this is my work now.

Mama Fun Roll

Posted 2015-12-08T18:53:56.760

Reputation: 7 234