A small language deserves a small interpreter

21

2

Here is a very simple language definition:

A Variable is any string that does not contain ^, <, >, !, or ?
The empty string is a valid variable identifier
The value of every variable starts at 0.
A Statement is one of (var is a Variable, P is a Program):
    var^   -> changes var to be equal to 1 more than itself
    var<P> -> while var > 0, changes var to be equal to 1 less than itself, then runs P
    var! -> output value of var
    var? -> ask for non-negative integer as input, increase var by that value
A Program is a concatenation of Statements, running a Program means running each Statement in order

Example programs (note that the empty string is a variable, but I will use it sparingly for the sake of clarity, and some variables are zeroed out in the program when they are usually 0 by default):

<>: sets the value of the empty string variable to 0
b<>b?b<a^>: asks for b, then adds the value stored in b to a, zeroing b in the process
b<>b?a<>b<a^>: asks for b, then sets a to the value of b, zeroing b in the process
a<>c<>b<a^c^>c<b^> : copies the value in b into a without zeroing it
b<>c<>a<c^c^c<b^>>b! : outputs a multiplied by 2
b^b<a<>a?a!b^> : outputs what you input, forever

Your goal is to write the smallest interpreter for this language.

  1. The value of a variable can be arbitrarily large and should only be limited by the total memory your language has access to, in theory, but you are only required to handle values up to 2^256.

  2. Your program should be able to handle arbitrarily long programs, in theory, but you will only be required to work on programs under 2^32 characters long. You are required to handle nested loops of depth up to 2^32 as well.

  3. You can assume that the program is a valid program, and that you will only ever get non-negative integers when you ask for input. You can also assume that only ASCII printable characters are included in the input string.

  4. The speed of the program you interpret doesn't matter, it will already be painfully slow for things as simple as 5-digit multiplication, without optimization.

  5. If you want to use a language which cannot reasonably accept input or produce output in the way described by the language, use any interpretation you want to make it possible. This applies to any reason your language can't implement some required behavior. I want all languages to be able to compete.

  6. Shortest program wins. Standard loopholes apply.

Fricative Melon

Posted 2016-02-10T18:19:48.103

Reputation: 1 312

As a side challenge I want to see how short a program I can write that outputs the number 2016, but first I need to wait for an interpreter to be written so that I can test my code. – Neil – 2016-02-10T20:28:22.840

1

I have an interpreter in Python 2.7 here.

– Fricative Melon – 2016-02-10T20:40:55.540

2

What is this language called? It deserves a place at http://esolangs.org

– wizzwizz4 – 2016-02-11T19:45:17.670

@Neil I managed to do it in 72 characters – Fricative Melon – 2016-02-11T19:56:00.053

@FricativeMelon 72? I can do it in 43! – Neil – 2016-02-11T20:00:10.443

@Neil I've gotten mine down to 39 – Fricative Melon – 2016-02-11T20:53:19.117

@FricativeMelon Looks like you've got me beat; 726262 took me 45, while 822(824-1) took me 43. – Neil – 2016-02-12T08:46:54.293

@Neil 60415263063373835637355132068513997507264512000000000 is a heck of a lot of characters – undergroundmonorail – 2016-02-12T09:24:55.693

@FricativeMelon 72829 is also 43, but I've now found 73834, which is only 42... – Neil – 2016-02-12T21:16:29.603

@FricativeMelon At last, I've managed 39: ^^^^<a^a^>a<^^^^><a^b^>a<c<b^^>b<c^^>>! – Neil – 2016-02-12T21:31:23.540

I don't think any of the answers actually support 256-bit variables. – 12Me21 – 2018-02-15T18:36:01.927

Answers

4

Ruby, 182 bytes

$h=Hash.new 0
def r(c)c.scan(/(([^!?^<>]*)(<(\g<1>*)>|[!?^]))/){$4?($1=~/(.*?)<(.*)>/
($h[$1]-=1;r$2)while$h[$1]>0):$3<?"?p($h[$2]):$h[$2]+=$3<?@?STDIN.gets.to_i:
1}end
r IO.read *$*

Try it like this:

$ cat code
a?b<>c<>a<c^c^c<b^>>b!

$ ruby lynn.rb code
3                           <-- input
6                           <-- output

How it works

The r function tokenizes an input string and executes each token:

def r(c)
    c.scan(/(([^!?^<>]*)(<(\g<1>*)>|[!?^]))/){
        ...
    }
end

We look for some variable name $2 matching [^!?^<>]*, followed by either

  • <...> where ... matches zero or more programs (\g is recursion), in which case $4 isn't nil
  • A !, ?, or ^ character, captured by $3, in which case $4 is nil.

Then the logic for executing a token is quite simple when indenting it a bit:

$4 ? (                                    # If it's a loop:
    $1 =~ /(.*?)<(.*)>/                   #   Re-match token*
    ($h[$1]-=1; r $2) while $h[$1] > 0    #   Recurse to run loop
) :                                       # Else:
    $3 < ?"                               #   If it's an !:
      ? p($h[$2])                         #     Print the var
      : $h[$2] +=                         #   Else, increment it by:
          $3 < ?@                         #     If it's a ?:
              ? STDIN.gets.to_i           #       User input
              : 1                         #     Else: 1

* There's an oniguruma bug, I think, that keeps me from simply using $3 here.

Lynn

Posted 2016-02-10T18:19:48.103

Reputation: 55 648

I am really curious how this works. – Jerry Jeremiah – 2016-02-11T09:50:49.087

1

JavaScript (ES6) 184 194 209

Edit Simplified (using function parameters for input and output seemed a nice idea, but it was not), 1 more byte saved thx @ӍѲꝆΛҐӍΛПҒЦꝆ

Edit 2 Modified parsing. The logic for increment/input is borrowed from @Lynn's answer

F=(p,i=0,v={},n='')=>eval("for(;c='>?^!<'.indexOf(q=p[i++]||'');n=~c?'':n+q)if(c>3){for(;v[n]--;)F(p,i,v);i=F(p,i,v[n]=0)}else~c&&v?c>2?alert(v[n]|0):v[n]=~~v[n]+(--c||+prompt()):0;i")

Less golfed

F=(p,      // program 
   i = 0,  // initial instruction pointer  
   v = {}, // variables (default to empty) or if 0, flag of dummy execution
   n = ''    // name of current variable (has to be local for recursive calls)
{
  for(; c='>?^!<'.indexOf(q=p[i++]||''); )
  // q = current character
  // c = current command (int 0..4 or -1 id not recognized)
  //     note 0 end of subprogram or end of program
  {
    if(c>3) // 4='<' call subprogram - recursive
    {
      for(;v[n]--;)
        F(p,i,v); // conditional call, repeated - using real environment
      v[n] = 0; // Reset variable at loop end
      i=F(p,i,0) // one more unconditional dummy call, just to advance i
    }
    else
      ~c&&v? // if valid command (1..3) and not dummy
      c>2?
        alert(v[n]|0) // output, undefined becomes 0
        :v[n]=~~v[n]+(--c||+prompt()) // inc with 1 or user input
      :0     // not valid command or dummy, do nothing
    n=~c?'':n+q // reset or update current variable name
  }
  return i // return current istruction pointer (for recursive calls)
}

TEST The snippet start evaluating 2016 using the program posted by @Neil. Be patient...

F=(p,i=0,v={},n='')=>eval("for(;c='>?^!<'.indexOf(q=p[i++]||'');n=~c?'':n+q)if(c>3){for(;v[n]--;)F(p,i,v);i=F(p,i,v[n]=0)}else~c&&v?c>2?alert(v[n]|0):v[n]=~~v[n]+(--c||+prompt()):0;i")

// TEST
function definput(){  I.disabled = KI.checked; }
function defoutput(){  O.disabled = KO.checked; }

function run()
{
  var prog=P.value, irows = I.value.split('\n'), pi=0;
  var fout=x=>O.value+=x+'\n';
  var fin=x=>irows[pi++];
  var saveAlert=alert, savePrompt=prompt
  if (!KO.checked) alert=fout,O.value=''
  if (!KI.checked) prompt=fin
  
  F(prog);
  
  alert=saveAlert
  prompt=savePrompt
}

P.value="^^^^<a^a^>a<^^^^><a^b^>a<c<b^^>b<c^^>>!"

run()
Program <button onclick="run()">RUN</button><br>
<textarea id=P></textarea><br>
Input (or <input type=checkbox id=KI onclick="definput()"> interactive prompt)<br>
<textarea id=I>5</textarea><br>
Output (or <input type=checkbox id=KO onclick="defoutput()"> popup)<br>
<textarea id=O readonly></textarea><br>

edc65

Posted 2016-02-10T18:19:48.103

Reputation: 31 086

Is using eval to avoid return not an option? – Mama Fun Roll – 2016-02-13T04:54:25.203

@ӍѲꝆΛҐӍΛПҒЦꝆ yes, eval saves 1 byte. I'm still looking for something more substantial – edc65 – 2016-02-13T08:39:44.310

0

Perl, 251 bytes

@p=split/([<>!?^])/,<>;for$c(0..$#p){$_=$p[$c];/</&&push@j,$c;if(/>/){$a=pop@j;$p[$c]=">$a";$p[$a]="<$c";}}while($c<$#p){$_=$p[$c];/\^/&&$v{$l}++;/!/&&print$v{$l};/\?/&&($v{$l}=<>);/<(\d+)/&&($v{$l}?$v{$l}--:($c=$1));/>(\d+)/&&($c=$1-2);$l=$_;$c++;} 

Easier to read version:

# treat the first line of input as a program

# split on punctuation keywords; @p will contain the program as a list
# of tokens (including whitespace between adjacent punctuation)
@p = split /([<>!?^])/, <>;

# rewrite jump addresses

# the interpreter could scan backwards to avoid this, but that idea
# makes me feel dirty
for $c (0..$#p) {
    $_ = $p[$c];
    # save loop-start address on stack
    /</ && push @j, $c;
    if (/>/) {
        # if we encounter a loop-end instruction, rewrite it and the
        # corresponding loop-start to include the address (of the
        # instruction---jumps have to offset from this)
        $a = pop @j;
        $p[$c] = ">$a";
        $p[$a] = "<$c";
    }
}

# execute the program

# our program is already in @p

# $c will contain our program counter

# $l will contain the name of the last-referenced variable

while ($c < $#p) {
    # move current instruction into $_ for shorter matching
    $_ = $p[$c];

    # increment instruction
    /\^/ && $v{$l}++;

    # output instruction
    /!/ && print $v{$l};

    # input instruction
    /\?/ && ($v{$l} = <>);

    # loop start, including address
    /<(\d+)/ && ($v{$l} ? $v{$l}-- : ($c = $1));

    # loop end, including address
    />(\d+)/ && ($c = $1-2);

    # copy current instruction into "last variable name"---this will
    # sometimes contain operators, but we have null-string
    # instructions between adjacent operators, so it'll be fine
    $l = $_;

    # advance the program counter
    $c++;
}

This wastes a bunch of bytes fixing up loops to be direct jumps, but scanning backwards for the loop start offended my sense of aesthetics.

David Morris

Posted 2016-02-10T18:19:48.103

Reputation: 241

0

Standard C++, 310 bytes

Thanks to @ceilingcat for some very nice pieces of golfing - now even shorter

#import<bits/stdc++.h>
#define b;break;case
std::map<std::string,long>V;int r(char*s){char*p,*e;for(long c;*s;s=p){p=strpbrk(s,"^<?!");c=*p;*p++=0;switch(c){b'^':V[s]++b'<':for(e=p,c=0;*e-62||c;)c+=(*e==60)-(*e++==62);for(*e++=0;V[s];r(strdup(p)))V[s]--;p=e;b'?':scanf("%llu",&V[s])b'!':printf("%llu",V[s]);}}}

Try it online!

Jerry Jeremiah

Posted 2016-02-10T18:19:48.103

Reputation: 1 217

@ceilingcat You're right - I'll go back and add TIO links to all my other answers... – Jerry Jeremiah – 2020-02-10T03:57:27.827