Squish-unsquish ligatures

17

Here is a list of some common ligatures in Unicode (the ones I could create with my Compose key on Debian):

Orig  Ascii  Lig
ae    [ae]   æ
AE    [AE]   Æ
oe    [oe]   œ
OE    [OE]   Œ
ij    [ij]   ij
IJ    [IJ]   IJ
ff    [ff]   ff
fi    [fi]   fi
fl    [fl]   fl
ffi   [ffi]  ffi
ffl   [ffl]  ffl

You have two options in this challenge: use the actual UTF-8 ligatures, or use the ASCII-only variant. If you use the actual UTF-8 ligature variants, you gain a 20% bonus. If you use the ASCII-only variant, you may assume square brackets will never be involved except to signify a ligature.

The challenge: given a string as input, output the same string

  • with all original ligatures replaced by their expanded counterparts.

    • match greedily: affib becomes affib (a[ffi]b), not affib (a[ff]ib) or affib (af[fi]b).
  • with all "expanded" letter sequences replaced by ligatures.

    • for example, æOEfoo ([ae]OEfoo) becomes aeŒfoo (ae[OE]foo).

Do this completely independently: ffi ([ff]i) becomes ffi (ffi), not ([ffi]).

Sound simple enough? There's a catch: every time two non-ligatures overlap by exactly one character, both of the ligatures must be inserted into the string. Here's a few test cases to demonstrate:

Input   Ascii-output      Output
fij     [fi][ij]          fiij
fIJ     f[IJ]             fIJ     * remember, capitalization matters!
fffi    [ff][ffi]         ffffi
fff     [ff][ff]          ffff
ffffi   [ff][ff][ffi]     ffffffi
ffffij  [ff][ff][ffi][ij] ffffffiij

Be careful: the same greedy matching applies (note especially the last few test cases).

, so shortest code in bytes wins.

Doorknob

Posted 2015-12-14T02:52:20.327

Reputation: 68 138

7@Mego What's the big deal? If your language of choice cannot handle æ natively, just print 0xc3 0xa6, its UTF-8 encoding. – Dennis – 2015-12-14T03:11:39.687

7If a language can't facilitate a given task, don't use that language for that task. That shouldn't be a big deal. – Alex A. – 2015-12-14T03:24:24.310

Answers

3

JavaScript (ES6), 213 bytes - 20% bonus = 170.4

s=>eval('for(p=o="";m=s.match(r="ffl|ffi|fl|fi|ff|IJ|ij|Œ|œ|Æ|æ|ffl|ffi|fl|fi|ff|IJ|ij|OE|oe|AE|ae",x=r.split`|`);s=s.slice(i+t.length-(p=t<"z")))o+=s.slice(p,i=m.index)+x[(x.indexOf(t=m[0])+11)%22];o+s.slice(p)')

Explanation

s=>                           // s = input string
  eval(`                      // use eval to avoid writing {} or return
    for(                      // iterate over each ligature match
      p=                      // p = 1 if the last match was a non-unicode ligature
        o="";                 // o = output string
      m=s.match(              // find the next ligature

        // r = regex string for ligatures (unicode and non-unicode)
        r="ffl|ffi|fl|fi|ff|IJ|ij|Œ|œ|Æ|æ|ffl|ffi|fl|fi|ff|IJ|ij|OE|oe|AE|ae",
        x=r.split\`|\`        // x = arrray of r

      );
      s=s.slice(i+t.length    // remove the part that has been added to the output
        -(p=t<"z"))           // if we matched a non-unicode ligature, keep the last
    )                         //     character so it can be part of the next match
      o+=s.slice(p,i=m.index) // add the text before the match to the output
        +x[(x.indexOf(        // add the opposite type of the matched ligature
          t=m[0]              // t = matched text
        )+11)%22];            // (index + 11) % 22 returns the opposite index
    o+s.slice(p)              // return o + any remaining characters
  `)

Test

var solution = s=>eval('for(p=o="";m=s.match(r="ffl|ffi|fl|fi|ff|IJ|ij|Œ|œ|Æ|æ|ffl|ffi|fl|fi|ff|IJ|ij|OE|oe|AE|ae",x=r.split`|`);s=s.slice(i+t.length-(p=t<"z")))o+=s.slice(p,i=m.index)+x[(x.indexOf(t=m[0])+11)%22];o+s.slice(p)')
<input type="text" id="input" value="ffiffffij" oninput="result.textContent=solution(input.value)" />
<pre id="result"></pre>

user81655

Posted 2015-12-14T02:52:20.327

Reputation: 10 181

Can r="ffl|ffi|fl|fi|ff|IJ|ij|Œ|œ|Æ|æ|ffl|ffi|fl|fi|ff|IJ|ij|OE|oe|AE|ae",x=r.split\|`` be rewritten as x="ffl|ffi|fl|fi|ff|IJ|ij|Œ|œ|Æ|æ|ffl|ffi|fl|fi|ff|IJ|ij|OE|oe|AE|ae".split\|`` for -4 bytes? – Dendrobium – 2015-12-14T23:47:57.893

@Dendrobium The match call requires the string separated with | characters. – user81655 – 2015-12-15T00:46:11.470