33
9
Lisp programmers boast that Lisp is a powerful language which can be built up from a very small set of primitive operations. Let's put that idea into practice by golfing an interpreter for a dialect called tinylisp
.
Language specification
In this specification, any condition whose result is described as "undefined" may do anything in your interpreter: crash, fail silently, produce random gobbldegook, or work as expected. A reference implementation in Python 3 is available here.
Syntax
Tokens in tinylisp are (
, )
, or any string of one or more printable ASCII characters except parentheses or space. (I.e. the following regex: [()]|[^() ]+
.) Any token that consists entirely of digits is an integer literal. (Leading zeros are okay.) Any token that contains non-digits is a symbol, even numeric-looking examples like 123abc
, 3.14
, and -10
. All whitespace (including, at a minimum, ASCII characters 32 and 10) is ignored, except insofar as it separates tokens.
A tinylisp program consists of a series of expressions. Each expression is either an integer, a symbol, or an s-expression (list). Lists consist of zero or more expressions wrapped in parentheses. No separator is used between items. Here are examples of expressions:
4
tinylisp!!
()
(c b a)
(q ((1 2)(3 4)))
Expressions that are not well-formed (in particular, that have unmatched parentheses) give undefined behavior. (The reference implementation auto-closes open parens and stops parsing on unmatched close parens.)
Data types
The data types of tinylisp are integers, symbols, and lists. Built-in functions and macros can also be considered a type, though their output format is undefined. A list can contain any number of values of any type and can be nested arbitrarily deeply. Integers must be supported at least from -2^31 to 2^31-1.
The empty list ()
--also referred to as nil--and the integer 0
are the only values that are considered logically false; all other integers, nonempty lists, builtins, and all symbols are logically true.
Evaluation
Expressions in a program are evaluated in order and the results of each sent to stdout (more on output formatting later).
- An integer literal evaluates to itself.
- The empty list
()
evaluates to itself. - A list of one or more items evaluates its first item and treats it as a function or macro, calling it with the remaining items as arguments. If the item is not a function/macro, the behavior is undefined.
- A symbol evaluates as a name, giving the value bound to that name in the current function. If the name is not defined in the current function, it evaluates to the value bound to it at global scope. If the name is not defined at current or global scope, the result is undefined (reference implementation gives an error message and returns nil).
Built-in functions and macros
There are seven built-in functions in tinylisp. A function evaluates each of its arguments before applying some operation to them and returning the result.
c
- cons[truct list]. Takes two arguments, a value and a list, and returns a new list obtained by adding the value at the front of the list.h
- head (car, in Lisp terminology). Takes a list and returns the first item in it, or nil if given nil.t
- tail (cdr, in Lisp terminology). Takes a list and returns a new list containing all but the first item, or nil if given nil.s
- subtract. Takes two integers and returns the first minus the second.l
- less than. Takes two integers; returns 1 if the first is less than the second, 0 otherwise.e
- equal. Takes two values of the same type (both integers, both lists, or both symbols); returns 1 if the two are equal (or identical in every element), 0 otherwise. Testing builtins for equality is undefined (reference implementation works as expected).v
- eval. Takes one list, integer, or symbol, representing an expression, and evaluates it. E.g. doing(v (q (c a b)))
is the same as doing(c a b)
;(v 1)
gives1
.
"Value" here includes any list, integer, symbol, or builtin, unless otherwise specified. If a function is listed as taking specific types, passing it different types is undefined behavior, as is passing the wrong number of arguments (reference implementation generally crashes).
There are three built-in macros in tinylisp. A macro, unlike a function, does not evaluate its arguments before applying operations to them.
q
- quote. Takes one expression and returns it unevaluated. E.g., evaluating(1 2 3)
gives an error because it tries to call1
as a function or macro, but(q (1 2 3))
returns the list(1 2 3)
. Evaluatinga
gives the value bound to the namea
, but(q a)
gives the name itself.i
- if. Takes three expressions: a condition, an iftrue expression, and an iffalse expression. Evaluates the condition first. If the result is falsy (0
or nil), evaluates and returns the iffalse expression. Otherwise, evaluates and returns the iftrue expression. Note that the expression that is not returned is never evaluated.d
- def. Takes a symbol and an expression. Evaluates the expression and binds it to the given symbol treated as a name at global scope, then returns the symbol. Trying to redefine a name should fail (silently, with a message, or by crashing; the reference implementation displays an error message). Note: it is not necessary to quote the name before passing it tod
, though it is necessary to quote the expression if it's a list or symbol you don't want evaluated: e.g.,(d x (q (1 2 3)))
.
Passing the wrong number of arguments to a macro is undefined behavior (reference implementation crashes). Passing something that's not a symbol as the first argument of d
is undefined behavior (reference implementation doesn't give an error, but the value cannot be referenced subsequently).
User-defined functions and macros
Starting from these ten built-ins, the language can be extended by constructing new functions and macros. These have no dedicated data type; they are simply lists with a certain structure:
- A function is a list of two items. The first is either a list of one or more parameter names, or a single name which will receive a list of any arguments passed to the function (thus allowing for variable-arity functions). The second is an expression which is the function body.
- A macro is the same as a function, except that it contains nil before the parameter name(s), thus making it a three-item list. (Trying to call three-item lists that do not start with nil is undefined behavior; the reference implementation ignores the first argument and treats them as macros as well.)
For example, the following expression is a function that adds two integers:
(q List must be quoted to prevent evaluation
(
(x y) Parameter names
(s x (s 0 y)) Expression (in infix, x - (0 - y))
)
)
And a macro that takes any number of arguments and evaluates and returns the first one:
(q
(
()
args
(v (h args))
)
)
Functions and macros can be called directly, bound to names using d
, and passed to other functions or macros.
Since function bodies are not executed at definition time, recursive functions are easily definable:
(d len
(q (
(list)
(i list If list is nonempty
(s 1 (s 0 (len (t list)))) 1 - (0 - len(tail(list)))
0 else 0
)
))
)
Note, though, that the above is not a good way to define a length function because it doesn't use...
Tail-call recursion
Tail-call recursion is an important concept in Lisp. It implements certain kinds of recursion as loops, thus keeping the call stack small. Your tinylisp interpreter must implement proper tail-call recursion!
- If the return expression of a user-defined function or macro is a call to another user-defined function or macro, your interpreter must not use recursion to evaluate that call. Instead, it must replace the current function and arguments with the new function and arguments and loop until the chain of calls is resolved.
- If the return expression of a user-defined function or macro is a call to
i
, do not immediately evaluate the branch that is selected. Instead, check whether it is a call to another user-defined function or macro. If so, swap out the function and arguments as above. This applies to arbitrarily deeply nested occurrences ofi
.
Tail recursion must work both for direct recursion (a function calls itself) and indirect recursion (function a
calls function b
which calls [etc] which calls function a
).
A tail-recursive length function (with a helper function len*
):
(d len*
(q (
(list accum)
(i list
(len*
(t list)
(s 1 (s 0 accum))
)
accum
)
))
)
(d len
(q (
(list)
(len* list 0)
))
)
This implementation works for arbitrarily large lists, limited only by the max integer size.
Scope
Function parameters are local variables (actually constants, since they can't be modified). They are in scope while the body of that call of that function is being executed, and out of scope during any deeper calls and after the function returns. They can "shadow" globally defined names, thereby making the global name temporarily unavailable. For example, the following code returns 5, not 41:
(d x 42)
(d f
(q (
(x)
(s x 1)
))
)
(f 6)
However, the following code returns 41, because x
at call level 1 is not accessible from call level 2:
(d x 42)
(d f
(q (
(x)
(g 15)
))
)
(d g
(q (
(y)
(s x 1)
))
)
(f 6)
The only names in scope at any given time are 1) the local names of the currently executing function, if any, and 2) global names.
Submission requirements
Input and output
Your interpreter may read the program from stdin or from a file specified via stdin or command-line argument. After evaluating each expression, it should output the result of that expression to stdout with a trailing newline.
- Integers should be output in your implementation language's most natural representation. Negative integers can be output, with leading minus signs.
- Symbols should be output as strings, with no surrounding quotes or escapes.
- Lists should be output with all items space-separated and wrapped in parentheses. A space inside the parentheses is optional:
(1 2 3)
and( 1 2 3 )
are both acceptable formats. - Outputting built-in functions and macros is undefined behavior. (The reference interpretation displays them as
<built-in function>
.)
Other
The reference interpreter includes a REPL environment and the ability to load tinylisp modules from other files; these are provided for convenience and are not required for this challenge.
Test cases
The test cases are separated into several groups so that you can test simpler ones before working up to more-complex ones. However, they will also work just fine if you dump them all in one file together. Just don't forget to remove the headings and the expected output before running it.
If you have properly implemented tail-call recursion, the final (multi-part) test case will return without causing a stack overflow. The reference implementation computes it in about six seconds on my laptop.
"Any token that consists entirely of digits is an integer literal. (Leading zeros are okay.) Any token that contains non-digits is a symbol, even numeric-looking examples like 123abc, 3.14, and -10." seems to contradict "Integers must be supported at least from -2^31 to 2^31-1." – msh210 – 2015-11-04T06:24:58.170
3@msh210 Not really, because the former is talking about tokens whereas the latter is talking about values. Even though there's no direct way to enter
-1
, I can still generate the value -1 by doing(s 0 1)
. – DLosc – 2015-11-04T06:51:44.390The spec says it's undefined behavior to compare objects of different types. If I'm recursively comparing lists, and at some level have to compare two objects of different types, is that also undefined? – feersum – 2015-11-04T07:05:42.390
@feersum Correct. This means it's not strictly possible in tinylisp to write a function to compare two arbitrary lists (though the builtin
e
can tell you whether two lists are identical or not). Of course, if your submission is capable of handling this case gracefully, then so much the better. – DLosc – 2015-11-04T07:09:02.357Is the evaluation order of items in a list defined? The case I have in mind is that one of them defines a global variable, which the others then reference. – feersum – 2015-11-04T08:42:44.243
@feersum Good question. I'm going to say that's undefined. I'm pretty sure the reference implementation does LTR, but there's nothing in the test cases that would require one way or the other. – DLosc – 2015-11-04T08:52:13.217
From your explanations, I guess that you want to have lexical binding of symbols. Maybe this should be made explicit. – coredump – 2015-11-04T14:29:53.820
1
@coredump After reading the pertinent Wikipedia article, I've concluded that the implementation is actually closer to dynamic, but with no scope nesting. Variables in function
– DLosc – 2015-11-04T18:37:57.160F
are not available in functionG
ifF
callsG
(as with dynamic scoping), but they are also not available in functionH
ifH
is a nested function defined insideF
(as with lexical scoping)--see test case 5. So calling it "lexical" might be misleading.1
To put it another way: because of the lack of scope nesting, an implementation could use either a dynamic or a lexical scoping strategy and come up with the same results. The only names in scope at any given time are 1) the local names of the currently executing function, if any, and 2) global names. Closures are not supported. (The reference implementation keeps a stack of name bindings corresponding to the call stack--a dynamic-style approach, which I think will be the easiest to implement.)
– DLosc – 2015-11-04T18:54:49.340@Ell I brought this up in chat previously: http://chat.stackexchange.com/transcript/240?m=25126700#25126700
– feersum – 2015-11-05T03:57:34.513What are the valid types for the condition of
i
? – feersum – 2015-11-05T07:22:33.890@feersum Anything. (See second paragraph of "Data types" section.) – DLosc – 2015-11-05T07:30:40.663
Should
v
eval in the caller's scope or the global scope? – feersum – 2015-11-05T08:04:14.150@feersum In the current local scope. It's just as if the code were being executed there directly. – DLosc – 2015-11-05T08:19:40.063
1
Obligatory xkcd.
– mınxomaτ – 2015-11-05T10:36:46.380Can I assume a maximum name length? – feersum – 2015-11-05T12:18:15.040
@feersum Sure, I can live with that. How does 99 characters sound? – DLosc – 2015-11-05T21:00:24.447
@Ell
v
does not have to be tail recursive. (It isn't tail recursive in the reference implementation, incidentally.) But it's really neat that your implementation does it that way! – DLosc – 2015-11-06T23:47:49.250The evaluation rule for a list should make explicit that the first item can be a symbol which is bound to a function, not just a function itself. For example: »A list of one or more items treats its first item as a function or macro (or a symbol which is bound to one) and calls it with the remaining items as arguments.« – Paŭlo Ebermann – 2015-11-07T15:05:53.673
To what should
((q ((#2) (s 6 #2))) 1)
evaluate? As I see it, this takes(q ((#2) (s 6 #2)))
as a function list, whereq
is the (unused) parameters name and((#2) (s 6 #2))
the expression. And this tries to interpret(#2)
as a function/macro, but it isn't (too few elements). Did I understand something wrong, or is there a mistake in your last example of In 5? – Paŭlo Ebermann – 2015-11-07T21:03:25.920@PaŭloEbermann I think this edit will answer both of your points: "A list of one or more items evaluates its first item and treats it as a function or macro." To your first point, the symbol is evaluated to whatever that name is bound to (which should be a function/macro). To the second point, the list
(q ((#2) (s 6 #2)))
is first evaluated as code. That evaluation sends the list((#2) (s 6 #2))
to theq
macro, which returns it unevaluated as a literal list, and that list is then treated as a function. Does that help? – DLosc – 2015-11-07T23:14:16.600@DLosc yes, this helps, thank you. I will post my solution as soon as I solved the last quirks (not related to these issues). – Paŭlo Ebermann – 2015-11-08T20:11:32.453