Formatting a Lisp-like Syntax

23

1

Background

(Based on a true, heart-wrenching story)

In my time, I've played around with Lisp and similar languages often. I've written with them, ran them, interpreted them, designed them, and made machines write with them for me... And if there is one thing that bothers me, it's seeing Lisp that does not comply with my specific formatting style.

Unfortunately, some text editors (cough XCode cough) tend to strip my beautiful tabs and spaces whenever code is copied and pasted... Take this beautifully spaced Lisp-like syntax:

(A
    (B
        (C)
        (D))
    (E))

(Where ABCDE are arbitrary functions)

SOME text editors butcher this lovely code to the following end:

(A
(B
(C)
(D))
(E))

What a mess! That's not readable!

Help me out, here?

The Challenge

Your goal in this challenge is to take a series of functions separated by newlines in a format described below and return a more beautiful arrangement that highlights readability and elegance.

The Input

We define a function F of arity N arguments as a construct similar to the following:

(F (G1 ...) (G2 ...) (G3 ...) ... (GN ...))

where G1, G2, ..., GN are all functions in and of themselves. An arity 0 function A is simply (A), while an arity 2 function B is of the form (B (...) (...))

Your code should take input as a series of functions with a single newline before every function's leading parenthesis (except for the first function). The example above is valid input.

You may assume:

  • The parentheses are balanced.
  • A function will never have to be indented more than 250 times.
  • EVERY function is surrounded by parentheses: ()
  • A function's name will only contain printable ASCII characters.
  • A function's name will never contain parentheses or spaces.
  • There is an optional trailing newline on input.

The Output

Your code should output the same set of functions, where the only changes made are the additions of spaces or tabs before the leading parentheses of functions. Output should comply with the following rules:

  • The first function (and later top-level functions) given should have no preceding spaces
  • An argument to a function's horizontal location is exactly one tab to the right of that function's horizontal location.
  • A tab is implementation-defined, but must be at least 3 spaces.
  • You may optionally print a maximum of two spaces after each line.

Rules

Examples

Input:

(A
(B
(C)
(D))
(E))

Output:

(A
    (B
        (C)
        (D))
    (E))

Input:

(!@#$%^&*
(asdfghjklm
(this_string_is_particularly_long
(...))
(123456789)))
(THIS_IS_TOP_LEVEL_AGAIN
(HERE'S_AN_ARGUMENT))

Output:

(!@#$%^&*
    (asdfghjklm
        (this_string_is_particularly_long
            (...))
        (123456789)))
(THIS_IS_TOP_LEVEL_AGAIN
    (HERE'S_AN_ARGUMENT))

Input:

(-:0
(*:0
(%:0
(Arg:6)
(Write:0
(Read:0
(Arg:30))
(Write:0
(Const:-6)
(Arg:10))))
(%:0
(Const:9)
(/:0
(Const:-13)
(%:0
(Arg:14)
(Arg:0)))))
(WriteArg:22
(-:0
(Const:45)
(?:0
(Arg:3)
(Arg:22)
(Arg:0)))))

Output:

(-:0
    (*:0
        (%:0
            (Arg:6)
            (Write:0
                (Read:0
                    (Arg:30))
                (Write:0
                    (Const:-6)
                    (Arg:10))))
        (%:0
            (Const:9)
            (/:0
                (Const:-13)
                (%:0
                    (Arg:14)
                    (Arg:0)))))
    (WriteArg:22
        (-:0
            (Const:45)
            (?:0
                (Arg:3)
                (Arg:22)
                (Arg:0)))))

BrainSteel

Posted 2015-05-21T15:03:01.080

Reputation: 5 132

Congrats on making the Hot Network Questions list! :D – Alex A. – 2015-05-21T18:56:45.523

@AlexA. Hooray! My dreams have been realized. :D – BrainSteel – 2015-05-21T19:27:35.670

What if there is no function name, like ()? – coredump – 2015-05-22T08:56:29.413

Does the indentation have to be >= 3 spaces, or is a tab acceptable? – isaacg – 2015-05-22T13:14:59.930

@isaacg You can assume all functions are named in this case. And whatever your OS/language define to be a horizontal tab is fine. If you use spaces, there must be at least 3. I'll clarify that when I can get to a computer. Thanks! – BrainSteel – 2015-05-22T13:35:54.343

I'm not sure how to combine A function's name will never contain parentheses or spaces. with A function's name is followed by a maximum of two trailing spaces.. Could you give an example? All test cases seem to contain only newlines, not spaces. – Dennis – 2015-06-17T05:07:40.350

@Dennis I believe that second quote was intended as a "if it is helpful to your program to assume there are trailing spaces, you can have up to 2 of them." You can safely ignore it. I'll clarify in the morning. Thanks! – BrainSteel – 2015-06-17T07:18:58.040

Can we assume that the indentation level will never go over some limit, e.g. 9999? – kirbyfan64sos – 2015-06-17T15:44:18.183

@kirbyfan64sos Sure, I've set it to 250 (to fit in one byte) in the question. – BrainSteel – 2015-06-17T15:49:15.407

Answers

9

Pyth, 24 20 19 18 bytes

FN.z+*ZC9N~Z-1/N\)

Increments a counter for every line, counts total number of closing parentheses encountered so far, and subtracts it from the counter. Then we indent by counter tabs.

orlp

Posted 2015-05-21T15:03:01.080

Reputation: 37 067

@Downvoter Care to explain? – orlp – 2015-05-22T05:34:03.990

I didn't downvote, but the *4 is a hardcoded and redundant preference. FN.z+*ZC9N~Z-1/N\) lets you use your editor's indent width and saves one byte. – Cees Timmerman – 2015-05-22T09:12:21.007

I concur, a tab would be one character shorter. \<tab> or C9. – isaacg – 2015-05-22T13:42:55.233

9

Common Lisp - 486 414 bytes (Rube Goldberg version)

(labels((p(x d)(or(when(listp x)(#2=princ #\()(p(car x)d)(incf d)(dolist(a(cdr x))(format t"~%~v{   ~}"d'(t))(p a d))(#2# #\)))(#2# x))))(let((i(make-string-input-stream(with-output-to-string(o)(#1=ignore-errors(do(b c)(())(if(member(setq c(read-char))'(#\( #\) #\  #\tab #\newline):test'char=)(progn(when b(prin1(coerce(reverse b)'string)o))(#2# c o)(setq b()))(push c b))))))))(#1#(do()(())(p(read i)0)(terpri)))))

Approach

Instead of doing like everybody else and count parentheses by hand, let's invoke the Lisp reader and do it The Right Way :-)

  • Read from input stream and write to a temporary output stream.
  • While doing so, aggregate characters different from (, ) or whitespace as strings.
  • The intermediate output is used to build a string, which contains syntactically well-formed Common-Lisp forms: nested lists of strings.
  • Using that string as an input stream, call the standard read function to build actual lists.
  • Call p on each of those lists, which recursively write them to the standard output with the requested format. In particular, strings are printed unquoted.

As a consequence of this approach:

  1. There are less restrictions on the input format: you can read arbitrarly formatted inputs, not just "one function per line" (ugh).
  2. Also, if the input is not well-formed, an error will be signaled.
  3. Finally, the pretty-printing function is well decoupled from parsing: you can easily switch to another way of pretty-printing S-expressions (and you should, if you value your vertical space).

Example

Reading from a file, using this wrapper:

(with-open-file (*standard-input* #P"path/to/example/file")
    ...)

Here is the result:

(!@#$%^&*
    (asdfghjklm
        (this_string_is_particularly_long
            (...))
        (123456789)))
(THIS_IS_TOP_LEVEL_AGAIN
    (HERE'S_AN_ARGUMENT))
(-:0
    (*:0
        (%:0
            (Arg:6)
            (Write:0
                (Read:0
                    (Arg:30))
                (Write:0
                    (Const:-6)
                    (Arg:10))))
        (%:0
            (Const:9)
            (/:0
                (Const:-13)
                (%:0
                    (Arg:14)
                    (Arg:0)))))
    (WriteArg:22
        (-:0
            (Const:45)
            (?:0
                (Arg:3)
                (Arg:22)
                (Arg:0)))))

(it seems that tabs are converted to spaces here)

Pretty-printed (golfed version)

Contrary to the safer original version we expect input to be valid.

(labels ((p (x d)
           (or
            (when (listp x)
              (princ #\()
              (p (car x) d)
              (incf d)
              (dolist (a (cdr x)) (format t "~%~v{  ~}" d '(t)) (p a d))
              (princ #\)))
            (princ x))))
  (let ((i
         (make-string-input-stream
          (with-output-to-string (o)
            (ignore-errors
             (do (b
                  c)
                 (nil)
               (if (member (setq c (read-char)) '(#\( #\) #\  #\tab #\newline)
                           :test 'char=)
                   (progn
                    (when b (prin1 (coerce (reverse b) 'string) o))
                    (princ c o)
                    (setq b nil))
                   (push c b))))))))
    (ignore-errors (do () (nil) (p (read i) 0) (terpri)))))

coredump

Posted 2015-05-21T15:03:01.080

Reputation: 6 292

7

Retina, 89 83 bytes

s`.+
$0<tab>$0
s`(?<=<tab>.*).
<tab>
+ms`^((\()|(?<-2>\))|[^)])+^(?=\(.*^((?<-2><tab>)+))
$0$3
<tab>+$
<empty>

Where <tab> stands for an actual tab character (0x09) and <empty> stands for an empty line. After making those replacements, you can run the above code with the -s flag. However, I'm not counting that flag, because you could also just put each line in its own source file, in which case the 7 newlines would be replaced by 7 penalty bytes for the additional source files.

This is a full program, taking input on STDIN and printing the result to STDOUT.

Explanation

Every pair of lines defines a regex substitution. The basic idea is to make use of .NET's balancing groups to count the current depth up to a given (, and then insert that many tabs before that (.

s`.+
$0<tab>$0

First, we prepare the input. We can't really write back a conditional number of tabs, if we can't find them somewhere in the input string to capture them. So we start by duplicating the entire input, separated by a tab. Note that the s` just activates the single-line (or "dot-all") modifier, which ensures that the . also matches newlines.

s`(?<=<tab>.*).
<tab>

Now we turn every character after that tab into a tab as well. This gives us a sufficient amount of tabs at the end of the string, without modifying the original string so far.

+ms`^((\()|(?<-2>\))|[^)])+^(?=\(.*^((?<-2><tab>)+))
$0$3

This is the meat of the solution. The m and s activate multi-line mode (so that ^ matches the beginnings of lines) and single-line mode. The + tells Retina to keep repeating this substitution until the output stops changing (in this case, that means until the pattern no longer matches the string).

The pattern itself matches a prefix of the input up to an unprocessed ( (that is, a ( that doesn't have any tabs before it, but should). At the same time it determines the depth of the prefix with balancing groups, such that the height of stack 2 will correspond to the current depth, and therefore to number of tabs we need to append. That is this part:

((\()|(?<-2>\))|[^)])+

It either matches a (, pushing it onto the 2 stack, or it matches a ), popping the last capturing from the 2 stack, or it matches something else and leaves the stack untouched. Since the parentheses are guaranteed to be balanced we don't need to worry about trying to pop from an empty stack.

After we've gone through the string like this and found an unprocessed ( to stop at, the lookahead then skips ahead to the end of the string, and captures tabs into group 3 while popping from the 2 stack until its empty:

(?=\(.*^((?<-2><tab>)+))

By using a + in there, we ensure that the pattern only matches anything if at least one tab should be inserted into the match - this avoids an infinite loop when there are multiple root-level functions.

<tab>+$
<empty>

Lastly, we just get rid of those helper tabs at the end of the string to clean up the result.

Martin Ender

Posted 2015-05-21T15:03:01.080

Reputation: 184 808

This is very cool. Well done! It's always a pleasure to see Retina. – BrainSteel – 2015-05-21T19:30:44.320

6

C: 95 94 characters

It's not very golfed yet, and from the question I'm unsure whether tabs are acceptable, which is what I use here.

i,j;main(c){for(;putchar(c=getchar()),c+1;i+=c==40,i-=c==41)if(c==10)for(j=i;j--;putchar(9));}

Ungolfed:

i,j;
main(c){
  for(
    ;
    putchar(c=getchar()),
    c+1;
    i+=c==40,
    i-=c==41
  )
    if(c==10)
      for(
        j=i;
        j--;
        putchar(9)
      );
}

Edit: Made it so that it quits on EOF.

Fors

Posted 2015-05-21T15:03:01.080

Reputation: 3 020

Tabs are perfectly acceptable. – BrainSteel – 2015-05-21T17:40:33.580

2Could you use if(c<11) instead of if(c==10)? – Digital Trauma – 2015-05-22T01:27:40.560

5

Julia, 103 99 97 94 88 bytes

p->(i=j=0;for l=split(p,"\n") i+=1;println("\t"^abs(i-j-1)*l);j+=count(i->i=='\)',l)end)

This defines an unnamed function that accepts a string and prints the indented version. To call it, give it a name, e.g. f=p->.... Note that the input must be a valid Julia string, so dollar signs ($) must be escaped.

Ungolfed + explanation:

function f(p)
    # Set counters for the line number and the number of close parens
    i = j = 0

    # Loop over each line of the program
    for l in split(p, "\n")
        # Increment the line number
        i += 1

        # Print the program line with |i-j-1| tabs
        println("\t"^abs(i-j-1) * l)

        # Count the number of close parens on this line
        j += count(i -> i == '\)', l)
    end
end

Example, pretending each set of four spaces is a tab:

julia> f("(A
(B
(C)
(D))
(E))")

(A
    (B
        (C)
        (D))
    (E))

Any suggestions are more than welcome!

Alex A.

Posted 2015-05-21T15:03:01.080

Reputation: 23 761

4

Haskell, 83 81

unlines.(scanl(\n s->drop(sum[1|')'<-s])$n++['\t'|'('<-s])"">>=zipWith(++)).lines

a very points free solution.

proud haskeller

Posted 2015-05-21T15:03:01.080

Reputation: 5 866

you can just drop h=. – Will Ness – 2015-05-25T22:17:05.533

3

Perl, 41

$_="\t"x($i-$j).$_;$i+=y/(/(/;$j+=y/)/)/

40 characters +1 for -p.

Run with:

cat input.txt | perl -pe'$_="\t"x($i-$j).$_;$i+=y/(/(/;$j+=y/)/)/'

hmatt1

Posted 2015-05-21T15:03:01.080

Reputation: 3 356

3

Python 2 - 88 78 Bytes

Fairly straightforward (and not very short) solution:

l=0
for x in raw_input().split():g=x.count;d=l*'\t'+x;l+=g("(")-g(")");print d

Kade

Posted 2015-05-21T15:03:01.080

Reputation: 7 463

A couple tips: 1) You can use '\t' instead of ' ' and save one byte; 2) no need to assign input.split() to a variable, since it's only used once (same for c, as well as d--just move the print statement); 3) operator precedence means parentheses around l*c aren't needed. Also, it looks like f isn't used for anything--is that a relic of a previous version? – DLosc – 2015-05-23T08:10:36.083

Also, if this is Python 2, you'll need to use raw_input instead of input (and don't forget the parentheses after it!). – DLosc – 2015-05-23T08:13:21.607

2

CJam, 20 bytes

r{_')e=NU)@-:U9c*r}h

Try it online in the CJam interpreter.

How it works

r                    e# Read a whitespace-separated token R from STDIN.
{                 }h e# Do, while R is truthy: 
  _')e=              e#   Push C, the number of right parentheses in R. 
       NU            e#   Push a linefeed and U (initially 0).
         )@-         e#   Compute U + 1 - C.
            :U       e#   Save in U.
              9c*    e#   Push a string of U tabulators.
                 r   e#   Read a whitespace-separated token R from STDIN.

Dennis

Posted 2015-05-21T15:03:01.080

Reputation: 196 637