Write a Clem interpreter

11

2

Clem is a minimal stack-based programming language featuring first-class functions. Your objective is to write an interpreter for the Clem language. It should properly execute all examples included in the reference implementation, which is available here.

The Clem language

Clem is a stack based programming language with first-class functions. The best way to learn Clem is to run the clem interpreter with no arguments. It will start in interactive mode, allowing you to play with the available commands. To run the example programs, type clem example.clm where example is the name of the program. This brief tutorial should be enough to get you started.

There are two main classes of functions. Atomic functions and compound functions. Compound functions are lists composed of other compound functions and atomic functions. Note that a compound function cannot contain itself.

Atomic Functions

The first type of atomic function is the constant. A constant is simply an integer value. For example, -10. When the interpreter encounters a constant, it pushes it to the stack. Run clem now. Type -10 at the prompt. You should see

> -10
001: (-10)
>

The value 001 describes the position of the function in the stack and (-10) is the constant you just entered. Now enter +11 at the prompt. You should see

> +11
002: (-10)
001: (11)
>

Notice that (-10) has moved to the second position in the stack and (11) now occupies the first. This is the nature of a stack! You will notice that - is also the decrement command. Whenever - or + precede a number, they denote the sign of that number and not the corresponding command. All other atomic functions are commands. There are 14 in total:

@  Rotate the top three functions on the stack
#  Pop the function on top of the stack and push it twice
$  Swap the top two functions on top of the stack
%  Pop the function on top of the stack and throw it away
/  Pop a compound function. Split off the first function, push what's left, 
   then push the first function.
.  Pop two functions, concatenate them and push the result
+  Pop a function. If its a constant then increment it. Push it
-  Pop a function. If its a constant then decrement it. Push it
<  Get a character from STDIN and push it to the stack. Pushes -1 on EOF.
>  Pop a function and print its ASCII character if its a constant
c  Pop a function and print its value if its a constant
w  Pop a function from the stack. Peek at the top of the stack. While it is
   a non-zero constant, execute the function.

Typing a command at the prompt will execute the command. Type # at the prompt (the duplicate command). You should see

> #
003: (-10)
002: (11)
001: (11)
> 

Notice that the (11) has been duplicated. Now type % at the prompt (the drop command). You should see

> %
002: (-10)
001: (11)
> 

To push a command to the stack, simply enclose it in parenthesis. Type (-) at the prompt. This will push the decrement operator to the stack. You should see

> (-)
003: (-10)
002: (11)
001: (-)
> 

Compound functions

You may also enclose multiple atomic functions in parenthesis to form a compound function. When you enter a compound function at the prompt, it is pushed to the stack. Type ($+$) at the prompt. You should see

> ($+$)
004: (-10)
003: (11)
002: (-)
001: ($ + $)
>

Technically, everything on the stack is a compound function. However, some of the compound functions on the stack consist of a single atomic function (in which case, we will consider them to be atomic functions for the sake of convenience). When manipulating compound functions on the stack, the . command (concatenation) is frequently useful. Type . now. You should see

> . 
003: (-10)
002: (11)
001: (- $ + $)
> 

Notice that the first and second functions on the stack were concatenated, and that the second function on the stack comes first in the resulting list. To execute a function that is on the stack (whether it is atomic or compound), we must issue the w command (while). The w command will pop the first function on the stack and execute it repeatedly so long as the second function on the stack is a non-zero constant. Try to predict what will happen if we type w. Now, type w. You should see

> w
002: (1)
001: (0)
> 

Is that what you expected? The two numbers sitting on top of the stack were added and their sum remains. Let's try it again. First we'll drop the zero and push a 10 by typing %10. You should see

> %10
002: (1)
001: (10)
> 

Now we'll type the entire function in one shot, but we'll add an extra % at the end to get rid of the zero. Type (-$+$)w% at the prompt. You should see

> (-$+$)w%
001: (11)
> 

(Note this algorithm only works if the first constant on the stack is positive).

Strings

Strings are also present. They are mostly syntactic sugar, but can be quite useful. When the interpreter encounters a string, it pushes each character from last to first onto the stack. Type % to drop the 11 from the previous example. Now, type 0 10 "Hi!" on the prompt. The 0 will insert a NULL terminator and the 10 will insert a new-line character. You should see

> 0 10 "Hi!"
005: (0)
004: (10)
003: (33)
002: (105)
001: (72)
> 

Type (>)w to print characters from the stack until we encounter the NULL terminator. You should see

> (>)w
Hi!
001: (0)
> 

Conclusions

Hopefully this should be enough to get you started with the interpreter. The language design should be relatively straight-forward. Let me know if anything is terribly unclear :) A few things have been left intentionally vague: values must be signed and at least 16 bits, the stack must be large enough to run all reference programs, etc. Many details haven't been carved out here because a full blown language specification would be prohibitively large to post (and I haven't written one yet :P). When in doubt, mimic the reference implementation.

The esolangs.org page for Clem

The reference implementation in C

Orby

Posted 2014-09-11T21:35:49.260

Reputation: 844

It's amazing what someone (@ceilingcat) who is good at golfing can do. The C version is now amazingly short. – Jerry Jeremiah – 2020-02-12T22:12:42.327

You say you haven't written the language specification yet. I take it then that you're the originator of the language? – COTO – 2014-09-12T02:59:51.067

@COTO That is correct. I created the language. – Orby – 2014-09-12T04:10:57.573

5Very important question: do you pronounce it "klem" or "see-lem"? – Martin Ender – 2014-09-12T09:42:25.507

4@MartinBüttner: "klem" :) – Orby – 2014-09-12T10:31:01.913

An "interpreter" to me implies a program that simply runs the various Clem instructions, not a complete shell such as the one you're using to view/manipulate the stack. However, you've included instructions such as q and h which really only have meaning in a shell context. Hence to get to my point: do you want an interpreter or a shell? – COTO – 2014-09-12T14:09:53.583

@COTO I want an interpreter. The shell does not need to be implemented. Specifically, the q and h commands need not be implemented. I'll remove them from the post. – Orby – 2014-09-12T17:23:36.620

I take it that ; is a comment to end of line, but the spec doesn't mention it.. must I implement it or can I ignore it? the examples do use it.. – Claudiu – 2014-09-13T06:04:55.963

@Claudiu Yes, comments must be implemented. Anything that is required to run the examples must be implemented. – Orby – 2014-09-13T06:09:09.770

Almost done! Just dealing with negative number literals. You should perhaps add examples that exercise those (e.g. 0- 1cc should output 1-1, while 0-1cc should output -10) – Claudiu – 2014-09-13T22:13:34.030

If I were you, I change the name... It sounds awfully like phlegm – Beta Decay – 2014-09-14T09:55:36.733

2You might want to specify the direction in which the @ command rotates the 3 top functions. (001-->002-->003-->001, or 003-->002-->001-->003) – kwokkie – 2014-09-17T09:11:51.970

too bad I were too late. – proud haskeller – 2014-09-24T20:26:22.587

@Orby thank you for the accepting my answer. I thought I was not going to get it – proud haskeller – 2014-10-07T04:30:26.647

Answers

1

Haskell, 931 921 875

this isn't fully golfed yet but it will probably never be. Still, it is already shorter than all other solutions. I will golf this more soon. I don't feel like golfing it any more than this.

probably has a few subtle bugs because I didn't play with the C reference implementation.

this solution uses the type StateT [String] IO () to store a "runnable" clem program. most the program is a parser which parses the "runnable program".

in order to run this use r "<insert clem program here>".

import Text.Parsec
import Control.Monad.State
import Control.Monad.Trans.Class
import Data.Char
'#'%(x:y)=x:x:y
'%'%(x:y)=y
'@'%(x:y:z:w)=y:z:x:w
'$'%(x:y:z)=y:x:z
'/'%((a:b):s)=[a]:b:s
'+'%(a:b)=i a(show.succ)a:b
'.'%(a:b:c)=(a++b):c
_%x=x
b=concat&between(s"(")(s")")(many$many1(noneOf"()")<|>('(':)&((++")")&b))
e=choice[s"w">>c(do p<-t;let d=h>>= \x->if x=="0"then a else u p>>d in d),m&k,s"-">>(m&(' ':)&k<|>c(o(\(a:b)->i a(show.pred)a:b))),s"c">>c(do
 d<-t
 i d(j.putStr.show)a),o&(++)&map(show.ord)&between(s"\"")(s"\"")(many$noneOf"\""),(do
 s"<"
 c$j getChar>>=m.show.ord),(do
 s">"
 c$do
 g<-t
 i g(j.putChar.chr)a),m&b,o&(%)&anyChar]
k=many1 digit
i s f g|(reads s::[(Int,String)])>[]=f$(read s::Int)|0<1=g
t=h>>=(o tail>>).c
c n=return n
a=c()
h=head&get
(&)f=fmap f
m=o.(:)
o=modify
u=(\(Right r)->r).parse(sequence_&many e)""
r=(`runStateT`[]).u
s=string
j=lift

proud haskeller

Posted 2014-09-11T21:35:49.260

Reputation: 5 866

5

Python, 1684 1281 characters

Got all the basic golf stuff done. It runs all the example programs and matches the output character-for-character.

import sys,os,copy as C
L=len
S=[]
n=[S]
Q=lambda:S and S.pop()or 0
def P(o):
 if o:n[0].append(o)
def X():x=Q();P(x);P(C.deepcopy(x))
def W():S[-2::]=S[-1:-3:-1]
def R():a,b,c=Q(),Q(),Q();P(a);P(c);P(b)
def A(d):
 a=Q()
 if a and a[0]:a=[1,a[1]+d,lambda:P(a)]
 P(a)
def V():
 a=Q();P(a)
 if a and a[0]-1and L(a[2])>1:r=a[2].pop(0);P(r)
def T():
 b,a=Q(),Q()
 if a!=b:P([0,0,(a[2],[a])[a[0]]+(b[2],[b])[b[0]]])
 else:P(a);P(b)
def r():a=os.read(0,1);F(ord(a)if a else-1)
def q(f):
 a=Q()
 if a and a[0]:os.write(1,(chr(a[1]%256),str(a[1]))[f])
def e(f,x=0):f[2]()if f[0]+f[1]else([e(z)for z in f[2]]if x else P(f))
def w():
 a=Q()
 while a and S and S[-1][0]and S[-1][1]:e(a,1)
def Y():n[:0]=[[]]
def Z():
 x=n.pop(0)
 if x:n[0]+=([[0,0,x]],x)[L(x)+L(n)==2]
D={'%':Q,'#':X,'$':W,'@':R,'+':lambda:A(1),'-':lambda:A(-1),'/':V,'.':T,'<':r,'>':lambda:q(0),'c':lambda:q(1),'w':w,'(':Y,')':Z}
def g(c):D[c]()if L(n)<2or c in'()'else P([0,1,D[c]])
N=['']
def F(x):a=[1,x,lambda:P(a)];a[2]()
def E():
 if'-'==N[0]:g('-')
 elif N[0]:F(int(N[0]))
 N[0]=''
s=j=""
for c in open(sys.argv[1]).read()+' ':
 if j:j=c!="\n"
 elif'"'==c:E();s and map(F,map(ord,s[:0:-1]));s=(c,'')[L(s)>0]
 elif s:s+=c
 elif';'==c:E();j=1
 else:
    if'-'==c:E()
    if c in'-0123456789':N[0]+=c
    else:E();c in D and g(c)

Testing:

Gather clemint.py, clemtest_data.py, clemtest.py, and a compiled clem binary into a directory and run clemtest.py.

Expanation:

The most ungolfed version is this one. Follow along with that one.

S is the main stack. Each item of the stack is a 3-list, one of:

Constant: [1, value, f]
Atomic: [0, 1, f]
Compound: [0, 0, fs]

For the constants, f is a function which pushes the constant onto the stack. For the atmoics, f is a function which executes one of the operations (e.g. -, +). For the compounds, fs is a list of items.

xec executes an item. If it's a constant or an atomic, it just executes the function. If it's a compound, if there has been no recursion yet, it executes each function. So executing (10 20 - 30) will execute each of the functions 10, 20, -, and 30, leaving 10 19 30 on the stack. If there has been recursion, then it just pushes the compound function onto the stack. For example, when executing (10 20 (3 4) 30), the result should be 10 20 (3 4) 30, not 10 20 3 4 30.

Nesting was a bit tricky. What do you do while reading off (1 (2 (3 4)))? The solution is to have a stack of stacks. At each nesting level, a new stack is pushed on the stack of stacks, and all push operations go onto this stack. Further, if there has been nesting, then atomic functions are pushed instead of executed. So if you see 10 20 (- 30) 40, 10 is pushed, then 20, then a new stack is created, - and 30 are pushed onto the new stack, and ) pops off the new stack, turns it into an item, and pushes it onto the stack one level down. endnest() handles ). It was a bit tricky since there's a special case when only one item has been pushed and we're pushing back onto the main stack. That is, (10) should push the constant 10, not a composite with the one constant, because then - and + don't work. I'm not sure whether this is principled but it's the way it works...

My interpreter is a character-by-character processor - it doesn't create tokens - so numbers, strings, and comments were somewhat annoying to deal with. There's a separate stack, N, for a currently-being-processed-number, and anytime a character which isn't a number is processed, I have to call endnum() to see whether I should first complete that number and put it on the stack. Whether we're in a string or a comment is kept track of by boolean variables; when a string is closed it pushes all the innards on the stack. Negative numbers required some special handling as well.

That's about it for the overview. The rest was implementing all the built-ins, and making sure to do deep copies in +, -, and #.

Claudiu

Posted 2014-09-11T21:35:49.260

Reputation: 3 870

Kudos! Did you have fun? :) – Orby – 2014-09-13T23:05:11.120

@Orby: I sure did! It's an interesting language, definitely a strange one. I'm hoping I can get an interpreter that's <1k. Not sure what to expect from other submissions. – Claudiu – 2014-09-13T23:13:36.143

4

C 839

Thanks to @ceilingcat for finding a much better (and shorter) version

This treats everything as simple strings - all stack items are strings, even constants are strings.

#define Q strcpy
#define F(x)bcopy(b,f,p-b);f[p-b-x]=!Q(r,p);
#define C(x,y)Q(S[s-x],S[s-y]);
#define N[9999]
#define A Q(S[s++]
#define D sprintf(S[s++],"%d"
#define G(x)}if(*f==x){
#define H(x)G(x)s--;
#define R return
char S N N;s;c;T(b,f,r)char*b,*f,*r;{char*p;strtol(b+=strspn(b," "),&p,0);if(p>b){F(0)R 1;}if(c=*b==40){for(p=++b;c;)c+=(*p==40)-(*p++==41);F(1)R-1;}p++;F(0)*r*=!!*b;R 0;}*P(char*p){if(*p==34)R++p;char*r=P(p+1);D,*p);R r;}E(char*x){char*p,c N,f N,r N,t N,u N,v N;for(Q(c,x);*c;Q(c,p)){Q(t,S[s-1]);if(T(c,f,p=r))A,f);else{{G(64)C(0,1)C(1,2)C(2,3)C(3,0)G(35)A,t);G(36)C(0,2)C(2,1)C(1,0)H(37)H(47)T(t,u,v);*v&&A,v);A,u);H(46)strcat(strcat(S[s-1]," "),t);H(43)D,atoi(t)+1);H(45)D,atoi(t)-1);G(60)D,getchar());H(62)T(t,u,v)-1||putchar(atoi(u));H(99)T(t,u,v)-1||printf(u);H(119)for(Q(u,t);atoi(S[s-1]);)E(u);G(34)p=P(p);}}}}

Try it online!

A less golfed version of my original (unlike the golfed version this one prints the stack when it ends if it is not empty and takes a -e parameter so you can specify the script on the command line instead of reading from a file):

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#define FIRST_REST(x) memcpy(first, b, p - b); first[p - b - x] = '\0'; strcpy(rest, p);
#define COPY(dest,src) strcpy(stack[size + dest], stack[size + src]);
char stack[9999][9999]; int size = 0;
int token(char *b, char *first, char *rest)
{
    while (*b == 32) b++;
    char *p; int x = strtol(b, &p, 0);
    if (p > b) { FIRST_REST(0) return 1; }
    if (*b == '(') { int c = 1; for (p = ++b; c; ++p) c += (*p == '(') - (*p == ')'); FIRST_REST(1) return -1; }
    p++; FIRST_REST(0) if (!*b) *rest = '\0'; return 0;
}
char *push(char *pointer)
{
    if (*pointer == '\"') return pointer+1;
    char *result = push(pointer+1);
    sprintf(stack[size++], "%d", *pointer);
    return result;
}
void eval(char *x)
{
    char program[9999], first[9999], rest[9999], tos[9999], tmp1[9999], tmp2[9999];
    char *pointer;
    for (strcpy(program, x); *program; strcpy(program, pointer))
    {
        *stack[size] = '\0';
        strcpy(tos, stack[size-1]);
        if (token(program, first, rest))
        {
            pointer = rest;
            strcpy(stack[size++], first);
        }
        else
        {
            pointer = rest;
            if (*first == '@'){
                COPY(0, -1) COPY(-1, -2) COPY(-2, -3) COPY(-3, 0) }
            if (*first == '#')
                strcpy(stack[size++], tos);
            if (*first == '$'){
                COPY(0, -2) COPY(-2, -1) COPY(-1, 0) }
            if (*first == '%')
                size--;
            if (*first == '/'){
                size--; token(tos, tmp1, tmp2); if (*tmp2) strcpy(stack[size++], tmp2); strcpy(stack[size++], tmp1); }
            if (*first == '.'){
                size--; strcat(stack[size - 1], " "); strcat(stack[size - 1], tos); }
            if (*first == '+'){
                size--; sprintf(stack[size++], "%d", atoi(tos) + 1); }
            if (*first == '-'){
                size--; sprintf(stack[size++], "%d", atoi(tos) - 1); }
            if (*first == '<')
                sprintf(stack[size++], "%d", getchar());
            if (*first == '>'){
                size--; if (token(tos, tmp1, tmp2) == 1) putchar(atoi(tmp1)); }
            if (*first == 'c'){
                size--; if (token(tos, tmp1, tmp2) == 1) printf("%s", tmp1); }
            if (*first == 'w'){
                size--; strcpy(tmp1, tos); while (atoi(stack[size - 1])) eval(tmp1); }
            if (*first == '\"')
                pointer=push(pointer);
        }
    }
}
int main(int argc, char **argv)
{
    char program[9999] = "";
    int i = 0, comment = 0, quote = 0, space = 0;
    if (!strcmp(argv[1], "-e"))
        strcpy(program, argv[2]);
    else
    {
        FILE* f = fopen(argv[1], "r");
        for (;;) {
            char ch = fgetc(f);
            if (ch < 0) break;
            if (!quote) {
                if (ch == '\n') comment = 0;
                if (ch == ';') comment = 1;
                if (comment) continue;
                if (ch <= ' ') { ch = ' '; if (space++) continue; }
                else space = 0;
            }
            if (ch == '\"') quote = 1 - quote;
            program[i++] = ch;
        }
        fclose(f);
    }
    eval(program);
    for (int i = 0; i < size; i++) printf("%03d: (%s)\r\n",size-i,stack[i]);
    return 0;
}

Jerry Jeremiah

Posted 2014-09-11T21:35:49.260

Reputation: 1 217

837 bytes – ceilingcat – 2020-02-20T22:13:27.577

Nice! Impressive you beat the Python solution in C. I have to upload my shorter version, I managed to shave off 60 bytes or so.. I still wonder if there's a different approach which would yield way less than 1000 characters – Claudiu – 2014-09-17T16:19:58.543

@Claudiu I thought so too - but I couldn't figure out how. – Jerry Jeremiah – 2014-09-17T22:00:44.050