Implement LaTeX accent macros

11

Introduction

The LaTeX typesetting system uses macros for defining accents. For example, the letter ê is produced by \hat{e}. In this challenge, your task is to implement an ASCII version of this functionality.

Input

Your input is a non-empty string of printable ASCII characters. It will not contain newlines.

Output

Your output is a string consisting of two lines. The first line contains accents, and the second line the characters they belong to. It is obtained from the input as follows (A denotes an arbitrary character):

  • Every \bar{A} is replaced by A with _ on top of it.
  • Every \dot{A} is replaced by A with . on top of it.
  • Every \hat{A} is replaced by A with ^ on top of it.
  • For a -10% bonus: every \tilde{A} is replaced by A with ~ on top of it.
  • All other characters have a space above them.

For example, the input

Je suis pr\hat{e}t.

results in the output

          ^
Je suis pret.

Rules and scoring

You can assume that the characters \{} only occur in the macros \bar{}, \dot{} and \hat{} (and \tilde{} if you go for the bonus). All macro arguments are exact one character long, so \dot{foo} and \dot{} will not occur in the input. The output can be a newline-separated string, or a list/pair of two strings. Any amount of trailing and preceding whitespace is allowed, as long as the accents are in the correct places. In particular, if there are no accents, the output can be a single string.

You can write a full program or a function. The lowest byte count (after bonuses) wins, and standard loopholes are disallowed.

Test cases

Without bonus:

Input:
No accents.
Output:

No accents.
Input:
Ch\hat{a}teau
Output:
  ^
Chateau
Input:
Som\bar{e} \dot{a}cc\hat{e}nts.
Output:
   _ .  ^
Some accents.
Input:
dot hat\dot{h}a\hat{t}\hat{ }x\bar{x}dot
Output:
       . ^^ _
dot hathat xxdot
Input:
\hat{g}Hmi\hat{|}Su5Y(\dot{G}"\bar{$}id4\hat{j}gB\dot{n}#6AX'c\dot{[}\hat{)} 6\hat{[}T~_sR\hat{&}CEB
Output:
^   ^     . _   ^  .      .^  ^     ^
gHmi|Su5Y(G"$id4jgBn#6AX'c[) 6[T~_sR&CEB

With bonus:

Input:
Ma\tilde{n}ana
Output:
  ~
Manana
Input:
\dot{L}Vz\dot{[}|M.\bar{#}0\hat{u}U^y!"\tilde{I} K.\bar{"}\hat{m}dT\tilde{$}F\bar{;}59$,/5\bar{'}K\tilde{v}R \tilde{E}X`
Output:
.  .   _ ^     ~   _^  ~ _      _ ~  ~
LVz[|M.#0uU^y!"I K."mdT$F;59$,/5'KvR EX`

Zgarb

Posted 2015-11-28T00:28:22.280

Reputation: 39 083

I started to prototype this in Go but then I realised how much simpler Python would be...

– cat – 2015-11-28T03:36:27.573

1Can we assume that each markup entry contains only one char? Or, in other words, is \bar{foo} a valid input? – Peter Taylor – 2015-11-28T07:43:06.930

@PeterTaylor Yes, every macro argument is exactly one character long. I'll clarify that. – Zgarb – 2015-11-28T15:26:09.453

Answers

4

Pyth, 51 46 45 43 41 40 bytes

I remove the curly braces and split at \, just like Reto Koradi's CJam answer does. The codes bar, dot and hat are recognized simply by the last decimal digit of the character code of the first character, modulo 3. I just add barf (RIP) """" to the first part and remove it in the end to save the code for handling the first part specially.

jtMsMCm,+@".^_"eChd*\ -ld4>d3c-+*4Nz`H\\

Try it online. Test suite.

PurkkaKoodari

Posted 2015-11-28T00:28:22.280

Reputation: 16 699

1"Then I just add barf..." +1 – Addison Crump – 2015-11-28T14:15:32.077

3

Julia, 204 184 bytes * 0.9 = 165.6

x->(r=r"\\(\w)\w+{(\w)}";t=[" "^endof(x)...];while ismatch(r,x) m=match(r,x);(a,b)=m.captures;t[m.offsets[1]-1]=a=="b"?'_':a=="d"?'.':a=="h"?'^':'~';x=replace(x,r,b,1)end;(join(t),x))

This is an anonymous function that accepts a string and returns a tuple of strings corresponding to the top and bottom lines. The top line will have trailing whitespace. To call the function, give it a name, e.g. f=x->...

Ungolfed:

function f(x::AbstractString)
    # Store a regular expression that will match the LaTeX macro call
    # with capture groups for the first letter of the control sequence
    # and the character being accented
    r = r"\\(\w)\w+{(\w)}"

    # Create a vector of spaces by splatting a string constructed with
    # repetition
    # Note that if there is anything to replace, this will be longer
    # than needed, resulting in trailing whitespace
    t = [" "^endof(x)...]

    while ismatch(r, x)
        # Store the RegexMatch object
        m = match(r, x)

        # Extract the captures
        a, b = m.captures

        # Extract the offset of the first capture
        o = m.captures[1]

        # Replace the corresponding element of t with the accent
        t[o-1] = a == "b" ? '_' : a == "d" ? '.' : a == "h" ? '^' : '~'

        # Replace this match in the original string
        x = replace(x, r, b, 1)
    end

    # Return the top and bottom lines as a tuple
    return (join(t), x)
end

Alex A.

Posted 2015-11-28T00:28:22.280

Reputation: 23 761

2

CJam, 53 bytes

Sl+'\/(_,S*\@{(i2/49-"_. ^"=\3>'}-_,(S*@\+@@+@@+\}/N\

Try it online

Explanation:

S       Leading space, to avoid special case for accent at start.
l+      Get input, and append it to leading space.
'\/     Split at '\.
(       Split off first sub-string, which does not start with an accent.
_,      Get length of first sub-string.
S*      String of spaces with the same length.
\       Swap the two. First parts of both output lines are now on stack.
@       Rotate list of remaining sub-strings to top.
{       Loop over sub-strings.
  (       Pop first character. This is 'b, 'd, or 'h, and determines accent.
  i       Convert to integer.
  2/      Divide by two.
  49-     Subtract 49. This will result in 0, 1, or 4 for the different accents.
  "_. ^"  Lookup string for the accents.
  =       Get the correct accent.
  \       Swap string to top.
  3>      Remove the first 3 characters, which is the rest of the accent string
          and the '{.
  '}-     Remove the '}. All the macro stuff is removed now.
  _,(     Get the length, and subtract 1. This is the number of spaces for the first line.
  S*      Produce the spaces needed for the first line.
  @\+     Bring accent and spaces to top, and concatenate them.
  @@+     Get previous second line and new sub-string without formatting to top,
          and concatenate them.
  @@+     Get previous first line and new accent and spacing to top,
          and concatenate them.
  \       Swap the two lines to get them back in first/second order.
}/      End loop over sub-strings.
N\      Put newline between first and second line.

Reto Koradi

Posted 2015-11-28T00:28:22.280

Reputation: 4 870

1

Haskell, 156 * 0.9 = 140.4 bytes

g('\\':a:r)=(q,l):g s where q|a=='b'='_'|a=='d'='.'|a=='h'='^'|a=='t'='~';(_,_:l:_:s)=span(<'{')r
g(a:b)=(' ',a):g b
g""=[('\n','\n')]
f=uncurry(++).unzip.g

Usage example:

*Main> putStr $ f "\\dot{L}Vz\\dot{[}|M.\\bar{#}0\\hat{u}U^y!\"\\tilde{I} K.\\bar{\"}\\hat{m}dT\\tilde{$}F\\bar{;}59$,/5\\bar{'}K\\tilde{v}R \\tilde{E}X`"
.  .   _ ^     ~   _^  ~ _      _ ~  ~  
LVz[|M.#0uU^y!"I K."mdT$F;59$,/5'KvR EX`

How it works: go through the input string character by character and build a list of pairs of characters, the left for the upper output string, the right for the lower output string. If a \ is found, take the appropriate accent, else a space for the left element. Finally transform the list of pairs into a single string.

nimi

Posted 2015-11-28T00:28:22.280

Reputation: 34 639

0

Python 3, 203 bytes

Without bonus:

l=list(input())
b=list(" "*len(l))
try:
 while 1:s=l.index("\\");t=l[s+1];del l[s+6];del l[s:s+5];b[s] = "b"==t and "_" or "d"==t and "." or "h"==t and "^" or "*";
except:print("".join(b)+"\n"+"".join(l));

I really hope there is a shorter version.

Alexander Nigl

Posted 2015-11-28T00:28:22.280

Reputation: 121

1It's always nice to see the progression of byte count. c: I suggest leaving the old byte count up, then surrounding it in <s></s>, then typing the new byte count so we can see the steps towards concision. – Addison Crump – 2015-11-28T14:21:05.183