Parse a two-dimensional syntax

25

1

Background

Alice and Bob are creating a golfing language to win every single PPCG challenge. Alice wants to make a two-dimensional language, like ><>, but Bob prefers a prefix-infix syntax like in J. As a compromise, they decide to create a two-dimensional prefix-infix language. The parser is a pain to write, and they need your help!

Syntax specification

In Alice's and Bob's language, there are variables, which are represented by lowercase ASCII letters a-z, and functions, which are represented by uppercase ASCII letters A-Z. A function can be invoked with one or two arguments. A program is a rectangular grid of letters a-zA-Z and spaces, and the top left corner must not contain a space. This is an example of a valid program:

F Gy
H   
R x 

When the program is parsed, it's transformed into an expression of a C-style language (C, Java, Python...) containing single-letter variables and function calls in the format <func>(<arg>) or <func>(<arg1>,<arg2>). For example, the above program results in this expression:

F(H(R(x)),G(x,y))

The details of the parsing process are as follows:

  • The spaces are just filler, so they are not parsed.
  • Every variable a-z is always parsed as itself.
  • Every function A-Z is parsed as a function call. Its arguments are the closest expressions below it and to its right in the grid, in this order. If only one of these is present, it's given as the sole argument. You can assume that all functions have at least one argument in the grid.

In the above example, the variables x and y are parsed as themselves. The function R has nothing below it and x to its right, so it's parsed as the one-argument invocation R(x). Similarly, H is parsed as H(R(x)), since it has R below it. The function G has x below it and y to its right, so it's parsed as G(x,y), and similarly for F. The expression parsed at the top left corner is the result of the parsing process.

Input and output

Your input is a non-empty rectangular array of characters. It will always be a valid program in Alice's and Bob's language, but it may contain expressions that are not used in the output. Your output shall be the parsed expression resulting from the above process.

Rules and scoring

You can write a full program of a function. The lowest byte count wins, and standard loopholes are disallowed.

Test cases

These are given in the format grid <newline> expression, with hyphens --- between the cases. The SE format leaves some lines blank, but they should be filled with spaces.

x
x
---
x y
z  
x
---
Fx
F(x)
---
Fx
y 
F(y,x)
---
ABu
A(B(u))
---
G
H
k
G(H(k))
---
ABCA
x xs
 DFk
A(x,B(D(F(k)),C(x,A(s))))
---
A  B  

C  D x
A(C(D(x)),B(D(x)))
---
RT Hq 
I xR k
R(I(x),T(H(R(k),q)))
---
A A  A a 
 S A  b  
B  C   Dx
d X  u f 
A(B(d,C(D(f,x))),A(X(u),A(u,a)))

Zgarb

Posted 2015-12-03T18:00:56.157

Reputation: 39 083

Would an output like (A (B (D x)) (C (D x))) be suitable or is the format fixed? – coredump – 2015-12-03T19:22:05.383

1@coredump In this challenge, the output format is strict; Alice and Bob want to be able to use the parsed expression directly. – Zgarb – 2015-12-03T19:28:41.347

Where is the "infix" part of the language? I only see prefix. – Paŭlo Ebermann – 2015-12-03T19:38:33.333

@PaŭloEbermann A function with two arguments can be thought of being between the arguments, on the Γ-shaped path from the bottom argument to the right one. – Zgarb – 2015-12-03T19:40:53.777

6Can you please actually create a language with this syntax? :) – Martin Ender – 2015-12-10T10:26:31.313

4Follow-up challenge: write a meta-golfer for this syntax (given an expression tree, find the smallest grid corresponding to it). For extra difficulty, distinguish between commutative and non-commutative functions. – Martin Ender – 2015-12-11T14:30:34.907

Answers

8

CJam, 67 62 60 58 57 54 bytes

Thanks to Dennis for saving 4 bytes.

{:Gsc_32&{"()"[GGz2{1m<_{S#}#>W<\}*z]{},{J}%',**+}|}:J

This defines a named block (function) J and leaves it on the stack. The function itself expects an array of strings on the stack, which represents the input grid, and leaves the desired string in its stead.

Test it here.

I've been meaning to tackle this one since it was posted, but apparently I needed a bounty and an overly long Pyth solution to motivate myself sufficiently.

Explanation

The solution is of course recursive and gradually builds up the string.

{
  :G       e#  Store the current grid in G.
  sc       e#  Convert the grid to a string, flattening it, then to a character, which
           e#  discards everything but the first character. This is a golfed 0=0=.
           e#  This character will either be an upper-case letter (function) or a lower-
           e#  case letter (variable).
  _32&     e#  Make a copy of the character and take bitwise AND with 32. This gives a
           e#  NULL character for functions and a space for variables.
  {        e#  If the result is falsy... (i.e. NULL, i.e. we have a function)
    "()"   e#   Push parentheses for later use.
    [      e#   Remember the current stack depth.
    GGz    e#   Push an array which contains the grid and its transpose.
    2{     e#   Run this block twice... (it will be applied once to each grid)
      1m<  e#    Rotate the rows. This has two effects: a) It removes the first row
           e#    from the front such that we can go looking for the next non-space
           e#    character down the first column from the start. b) It places a row
           e#    at the end which we know doesn't start with a space, which acts as
           e#    a guard in case there are no further letters below the current one.
      _    e#    Duplicate this grid.
      {    e#    Find the first index where this block yields something truthy...
        S# e#     Find the index of the first space in the current row. If the row
           e#     starts with a space, this is 0, i.e. falsy and the search continues.
           e#     If there is a space in any later position, that will be positive and
           e#     truthy, so this index gets returned. If there is no space at all,
           e#     the result is -1 which is also truthy and ends the search.
      }#        
      >    e#     Discard all lines up to (but not including) the found index.
      W<   e#     Discard the guard row at the end.
      \    e#     Swap the grids so that the next pass processes the other grid.
    }*       
    z      e#    Transpose the second grid back to its original orientation.
    ]      e#    Wrap both processed grids in an array.
    {},    e#    Remove a grid if it's empty.
    {J}/   e#    For each remaining grid, call J recursively.
    ',*    e#    Join the grids with commas if there are still two of them.
    *      e#    Wrap this string in the parentheses below on the stack.
    +      e#    Prepend the function name.
  }|
}:J

Martin Ender

Posted 2015-12-03T18:00:56.157

Reputation: 184 808

5

Python 2, 227 223 192 182 179 177 bytes

def r(i,y=0,x=0):
 c=i[y][x];z=[]
 for t in"pq":
    p=q=0
    try:
     while" "==i[y+p][x+q]or 1>p+q:exec t+"+=1"
     z+=[r(i,y+p,x+q)]
    except:1
 return c+"(%s)"%",".join(z)*c.isupper()

(The four spaces are in fact tabs)

Takes a 2d list of characters as the first argument to r.

Blue

Posted 2015-12-03T18:00:56.157

Reputation: 26 661

5

Haskell, 124 122 120 119 bytes

r@((c:_):_)#g|c>'_'=[c]|c<'!'=g%r|1<2=c:'(':(tail%r)!(map tail%r)++")"
_#_=""
g%r=g r#g
a!b=a++[','|a>"",b>""]++b
(#id)

Usage example: (#id) ["RT Hq ","I xR k"] -> "R(I(x),T(H(R(k),q)))".

How it works: besides the input grid r, the function # takes another function g as an argument that is applied on r whenever the top left character is a space. If it's a lowercase char instead, return it. Otherwise it must be an uppercase character and # is called recursively, once with tail to go down and once with map tail to go right. ! joins the results from the recursive calls with a ,, if needed. All starts with the input grid and the identity function.

nimi

Posted 2015-12-03T18:00:56.157

Reputation: 34 639

5

Pyth, 97 bytes

D:TkdI}p@@QTkGR)=Z0p\(V2JK0W|q\ @@Q=b+TJ=H+kK!+JK=J+NJ=K+!NK)I!|gblQgHl@Q0p*Z\,:bH+=Z1d))p\);:000

My god that took a long time to make (about 5/6 hours?). Pyth really wasn't designed for this...

Try it here.

Attempt at explanation as well as python equivalent

Q = literal_eval(input())

def at_slice(T,k,d):
  if Pprint(Q[T][k]) in "abcdefghijklmnopqrstuvwxyz": return 
  Z = 0
  Pprint("(")
  for N in range(2):
    J=K=0
    while " "==Q[assign('b',T+J)][assign('H',k+K)] or not J+K:
      J+=N
      K+=not N
    if not (b>=len(Q) or H>=len(Q[0])):
      Pprint(Z*",")
      at_slice(b,H,assign('Z',1)+d)
   Pprint(")")
at_slice(0,0,0)

Where the functions Pprint and assign return what they're given.

Blue

Posted 2015-12-03T18:00:56.157

Reputation: 26 661

Much concision. Such wow. – Addison Crump – 2016-01-23T21:46:52.450

0

Python 3, 187 bytes

Still looking for ways to golf this down, but I'm just pleased I managed to turn it into a one-liner.

lambda g,r=0,c=0:g[r][c]+'(%s)'%','.join([p(g,R,c)for R in range(r+1,len(g))if c<len(g[R])and' '!=g[R][c]][:1]+[p(g,r,C)for C in range(c+1,len(g[r]))if' '!=g[r][C]][:1])*g[r][c].isupper()

Tim Pederick

Posted 2015-12-03T18:00:56.157

Reputation: 1 411