Golf a Purple Interpreter

13

1

Golf a Purple Interpreter

Purple is an esolang which is designed with two main purposes:

  • To be a minimization of Aubergine, since there just aren't enough self-modifying one-instruction languages around.
  • To admit the possibility of terrifyingly small golfed interpreters. My first pass at a reasonably full-featured Python 2 interpreter is only 702 bytes, and I'm sure a more experienced golfer could shave quite a bit from that.

Your goal is to write an interpreter for this language.

Information on Purple:

A Purple program is a sequence of characters placed into an infinite, addressable memory array such that the first character of the program is placed at address zero. The rest of the array (both before and after where the Purple program is stored) is initialized to zero.

There are three registers in Purple, called a and b and i, each of which can hold a signed integer and is initialized to zero. i is also the instruction pointer, and always points to the currently executing Purple instruction.

Each cycle, the interpreter will read a sequence of three contiguous characters beginning from the memory location indicated by the instruction pointer and attempt to execute this sequence as the Purple instruction. Afterwards, the instruction pointer is always incremented by 3.

Syntactically, the Purple instruction consists of three characters (or encodings thereof) in a row, like "xyz".

The first character x can be any of the following:

abABio

These symbols have the following meaning:

a - Place the result in register a.
b - Place the result in register b.
A - Place the result in the location in memory referred to by register a.
B - Place the result in the location in memory referred to by register b.
i - Set the instruction pointer to the result.
o - Output the result to stdout.

The other two bytes y and z can be any of the following:

abABio1

Each of these symbols has the following meaning:

a - Return the contents of register a.
b - Return the contents of register b.
A - Return the contents of the memory array at the address stored in register a.
B - Return the contents of the memory array at the address stored in register b.
i - Return the contents of register i (the instruction pointer).
o - Return the value of a single character read from stdin.
1 - Return the literal numeric value 1.

After fetching the instruction, the Purple interpreter will evaluate y and then z, subtract the result of z from the result of y, and then perform the action indicated by x on the difference.

If the sequence of three characters (or encodings thereof) is not a valid Purple instruction, the interpreter halts immediately without giving any errors.

Your interpreter must:

  • Be a complete program, not a function.
  • Never output to stderr, unless EOF is read.
  • Behave identically to the reference implementation on all well-formed inputs that do not involve very large numbers, including the test programs given below. (Well, identically up to timing--it can run slower, but not by too much!)

You may provide the program to the interpreter in any form you wish: read it from a file, embed it in the program as a string, or read it from stdin.

Test cases:

The program

ooo

when run with input

z!

should yield

Y

The program

bbboobiii

when run with input

It's a cat program.

(or any other input) should yield

It's a cat program.

(or whatever input it received) and then start over and do the same thing again.


The program

Aoab11bi1bABoAaiba

when run with input

0

should yield

0

and then halt, but when run with input

1

should continue outputting

1

forever.


The program

b1bbb1oAbabaa1ab1Ab1Bi1b

should yield

b1bbb1oAbabaa1ab1Ab1Bi1b

The program

aA1aa1bb1oAbbi1bb1bbAb1Bi1b Purple is the awesomest! Why haven't you tried it yet?
!dlroW ,olleG

should yield

Hello, World!

Scoring:

This is , so shortest source in bytes, as potentially modified by the following bonus, wins.

Bonus:

  • -10% if your interpreter reads a filename from either stdin or from a command line argument and loads the program from the file.

quintopia

Posted 2015-12-01T03:30:07.433

Reputation: 3 899

1What is the size of the memory cells? bytes, characters (unicode ones?), (arbitrary) large integers? It looks like you are using "character" and "byte" with the same meaning. – Paŭlo Ebermann – 2015-12-01T20:07:59.057

@PaŭloEbermann my guess is that's implementation-specific; for example I need to use uint32 for characters and MAXINT for ints – cat – 2015-12-01T20:10:04.313

What should be the value of o when the input is closed (i.e. EOF)? And what would be read after that? – Paŭlo Ebermann – 2015-12-01T20:42:07.523

@PaŭloEbermann do whatever the reference implementation does in this situation. (I'm fairly certain it gives 0 on EOF.) – quintopia – 2015-12-01T22:13:07.003

@quintopia actually, your cat program demonstrates that the reference implementation throws an exception in this case (TypeError: ord() expected a character, but string of length 0 found), at least here (Python 2.7.6 on Ubuntu 14.04) – read(1) from a file at EOF gives an empty string, and ord on that is not defined. I guess finishing the program without an exception would work too, as we are not allowed to print to stderr? – Paŭlo Ebermann – 2015-12-01T23:20:36.550

@PaŭloEbermann Handle it any way you wish then. I'll allow writing to STDERR when reading EOF. – quintopia – 2015-12-02T00:20:16.597

I put quite a few hours and quite a bit of hope into doing this in Go, only to realise the tape needs to be actually negatively addressable. Welp, there goes that project. – cat – 2015-12-02T02:01:19.017

2@sysreq Is that really a blocker? Your implementation could simply have two tapes, one for negative and one for positive indices. (Yes, this will take a bit more code, but not that much, I think.) – Paŭlo Ebermann – 2015-12-02T16:58:41.927

Having to figure out which of two tapes to assign to and such seems rather long, but I'll try it.

Suppose I write an interpreter/runtime that can read from a file: if I make the runtime basically allow Purple to bootstrap and interpret itself (unless someone would like to provide a kernel that ships with Purple instructions builtin), does that actually count as Purple interpreting Purple? – cat – 2015-12-03T12:40:57.367

because there needs to be some sort of chicken-and-egg-bootstrap-resolver, but if that's not written in Purple then does it qualify? – cat – 2015-12-03T12:41:58.680

1@sysreq basically, a Purple self-interpreter would be a program that reads a Purple program from stdin and then does whatever that program would do. The first Purple program (the interpreter) can run in whatever interpreter you like. A program that completely overwrites the lowest memory addresses with the input, then deletes itself before somehow jumping to the read code would qualify (though I don't think this is actually possible). – quintopia – 2015-12-03T19:10:25.107

2I came so close to having a runtime capable of self-interpretation, but I was too late. – cat – 2015-12-10T12:44:51.693

Post it anyway if you get it. I'll throw a love your way. – quintopia – 2015-12-10T14:41:38.163

Answers

7

Pyth, 148 128 121 bytes (or 124 * .9 = 111.6, see bottom)

J,00=kjb.z .eXHkCbz#=b-Fm?=zx"oabABi1"C@H+Zd@s[0Jm.x@Hk0JZ1H)zCh~tkS2 ?hKx"abAB"=YC@HZ?PKXH@JKbXJKb?qY\i=Zb?qY\opCbvN=+Z3

Test suite

Code given on first line of STDIN, input to the Purple program on the rest of STDIN. To use code with newlines, use alternate version at the bottom.

Reasonably golfed. Here's it with linebreaks and indentation for clarity:

J,00
=kjb.z
 .eXHkCbz
#
  =b-Fm
    ?=zx"oabABi1"C@H+Zd
      @
        s[0Jm.x@Hk0JZ1H)
        z
      Ch~tk
    S2
   ?hKx"abAB"=YC@HZ
    ?PK
      XH@JKb
      XJKb
  ?qY\i=Zb
  ?qY\opCb
  vN
  =+Z3

Basically, a # loop does the execution and halting via error-break.

a and b are combined into a single variable, J. Z is the instruction pointer. k is input to the Purple program. H is the tape, represented as a dictionary. b is the current result. Y is the current first byte of the instruction.

Reading from file:

J,00=kjb.z .eXHkCbjb'z#=b-Fm?q\o=zC@H+ZdCh~tk@s[Jm.x@Hk0JZ1H)x"abABi1"zS2 ?hKx"abAB"=YC@HZ?PKXH@JKbXJKb?qY\i=Zb?qY\opCbvN=+Z3

Give file name as first line of STDIN. Test run:

$ cat purple-final.pyth 
J,00=kjb.z .eXHkCbjb'z#=b-Fm?=zx"oabABi1"C@H+Zd@s[0Jm.x@Hk0JZ1H)zCh~tkS2 ?hKx"abAB"=YC@HZ?PKXH@JKbXJKb?qY\i=Zb?qY\opCbvN=+Z3
$ cat purple-code.txt 
aA1aa1bb1oAbbi1bb1bbAb1Bi1b Purple is the awesomest! Why haven't you tried it yet?
!dlroW ,olleG
$ pyth purple-final.pyth <<< 'purple-code.txt' 
Hello, World!

isaacg

Posted 2015-12-01T03:30:07.433

Reputation: 39 268

5

JavaScript (ES6), 292 bytes

eval(`a=b=i=d=0;v=n=>(x=m[i+n])==97?a_98?b_65?m[a]_66?m[b]_105?i_111?p()[c]()_49?1:d=1;for(m=[...(p=prompt)()].map(b=>b[c="charCodeAt"]());!d;i+=3)(y=v(1),d)||(z=v(2),d)?1:(x=m[r=y-z,i])==97?a=r_98?b=r_65?m[a]=r_66?m[b]=r_105?i=r-3_111?alert(String.fromCharCode(r)):d=1`.replace(/_/g,":x=="))

Explanation

JavaScript answers are always weird when STDIN and STDOUT are required...

The first prompt is the input for the program string. Each prompt that results from an o instruction will read only the first character.

eval is used to replace a common phrase which saves a few bytes. Ungolfed and without the eval the program looks like this:

// Initialisation
a=b=i=                            // initialise registers to 0
  d=0;                            // d is set to true when the program should die

// Gets the result of Y or Z
v=n=>                             // n = offset from i
  (x=m[i+n])==97?a:               // x = value of instruction
  x==98?b:
  x==65?m[a]:
  x==66?m[b]:
  x==105?i:
  x==111?p()[c]():
  x==49?1:
  d=1;                            // if it was none of the valid values, die

// Execution loop
for(
  m=                              // m = memory array
    [...(p=prompt)()]             // receive the program
    .map(b=>b[c="charCodeAt"]()); // initialise m to the ASCII values of the program
  !d;                             // finish if an error occured
  i+=3                            // increment i
)
  (y=v(1),d)||                    // get the value of Y and check for errors
  (z=v(2),d)?1:                   // get the value of Z and check for errors

    // Get the result of X
    (x=m[r=y-z,i])==97?a=r:       // r = result of y - z
    x==98?b=r:
    x==65?m[a]=r:
    x==66?m[b]=r:
    x==105?i=r-3:
    x==111?alert(String.fromCharCode(r)):
    d=1

user81655

Posted 2015-12-01T03:30:07.433

Reputation: 10 181

2Can the second c="charCodeAt" be replaced with just c? – Dendrobium – 2015-12-01T19:50:00.487

Does array access with negative indices work in JavaScript? – nimi – 2015-12-01T23:37:38.740

@Dendrobium Wow, I don't know how I missed that haha! Thanks. – user81655 – 2015-12-02T01:00:20.587

2@nimi It works. Arrays themselves don't support negative indices, but this takes advantage of the fact that they also behave as objects. array[-1] = 1 is the same as array = { "-1": 1 }. Both can be accessed with array[-1]. – user81655 – 2015-12-02T01:04:40.800

@user81655: Ah nice, didn't know that. – nimi – 2015-12-02T05:51:11.277

3

Ceylon, 827 792 671 bytes

import ceylon.language{l=variable,I=Integer,x=nothing,p=process,m=map}shared void run(){try{if(exists d=p.arguments[0]){l value t=m{*d*.hash.indexed};l I a=0;l I b=0;l I i=0;I g(I j)=>t[j]else 0;l{I*}c=[];I o{if(c==[]){if(exists e=p.readLine()){c=e*.hash.chain{10};}else{c={-1}.cycled;}}assert(is I r=c.first);c=c.rest;return r;}value f=m{97->{a},98->{b},65->{g(a)},66->{g(b)},105->{i},111->{o},49->{1}};value s=m{97->((I v)=>a=v),98->((I v)=>b=v),65->((I v)=>t=m{a->v,*t}),66->((I v)=>t=m{b->v,*t}),105->((I v)=>i=v),111->((I v)=>p.write("``v.character``"))};I h(I v)=>f[v]?.first else x;while(0<1){(s[g(i)]else x)(h(g(i+1))-h(g(i+2)));i+=3;}}}catch(AssertionError e){}}

It behaves a bit differently than the reference implementation when the program tries to read input at EOF – the reference implementation crashes with a TypeError, which is too costly to reproduce here (and also likely a bug), so this will return -1 instead (repeatedly, if necessary).

(When trying to write this -1 to stdout, the interpreter will finish with an OverflowError, though. Similar will happen if an Integer outside of the Unicode range is output.)

The interpreter takes the program as its first command line argument (make sure to quote it for your shell when it contains whitespace or other interesting stuff).

In Ceylon we can only easily read input line-wise (I guess this will change in one of the next versions), so when o is used for reading, I'm reading a whole line and buffer the parts for future uses. I guess it works similar in the Python implementation when connected to a terminal.


When trying to execute a command (part) which is not one of the valid characters, the nothing will cause an AssertionError to be thrown, which we then catch in the catch block around the main loop.

I think this should be preferably be a custom Exception type (as AssertionError could also occur at other places if I have a bug), but that would take much more space, eating most improvements I made from the first version.

Some tricks used for golfing:

  • The previous versions used a ceylon.collection.HashMap – instead we now use an immutable map as created by the map function, and create a new one each time A or B is used as x.
  • I use alias-imports for all identifiers from ceylon.language which are used more than once (including the variable annotation, which is now l).
  • I got rid of the class E (for environment) and the s (step) method – everything now happens inside the run function.
  • Instead of using .integer for getting the codepoint of a character, .hash gives the same result. Thus string*.hash is the same as string.map(Character.integer) (gives an iterable of the codepoints from a string).
  • When a type is alias-imported, is I ... is shorter than exists ....
  • When converting something (e.g. x) into a string, "``t``" is shorter than t.string (or, what I used for a character, String{t}).
  • functions used just once can often be inlined.

Here is the formatted (and commented) version:

// Purple – a self-modifying, "one-instruction" language.
//
// Question:  http://codegolf.stackexchange.com/q/65411/2338
// My answer: http://codegolf.stackexchange.com/a/65492/2338

import ceylon.language {
    l=variable,
    I=Integer,
    x=nothing,
    p=process,
    m=map
}

shared void run() {
    try {
        // Reading code from file certainly takes more than 73 characters,
        // this isn't worth the 10% bonus.
        if (exists d = p.arguments[0]) {

            // The memory tape, as a Map<Integer, Integer>.
            // We can't modify the map itself, but we
            // can replace it by a new map when update is needed.
            l value t = m {
                // It is initialized with the code converted to Integers.
                // We use `.hash` instead of `.integer` because it is shorter.
                *d*.hash.indexed };

            // three registers
            l I a = 0;
            l I b = 0;
            l I i = 0;

            // get value from memory
            I g(I j) =>
                    t[j] else 0;

            // cached input which is still to be read
            l {I*} c = [];

            // get value from stdin.
            // we can only comfortably access stdin by line, so we read a whole line
            // and cache the rest for later.
            I o {
                if (c == []) {
                    if (exists e = p.readLine()) {
                        c = e*.hash.chain { 10 }; // convert string into ints, append \n
                    } else {
                        // EOF – return just -1 from now on.
                        c = { -1 }.cycled;
                    }
                }
                assert (is I r = c.first);
                c = c.rest;
                return r;
            }


            // Map of "functions" for fetching values.
            // We wrap the values in iterable constructors for lazy evaluation
            //  – this is shorter than using (() => ...).
            // The keys are the (Unicode/ASCII) code points of the mapped
            // source code characters.
            value f = m {
                // a
                97 -> { a },
                // b
                98 -> { b },
                // A
                65 -> { g(a) },
                // B
                66 -> { g(b) },
                // i
                105 -> { i },
                // o
                111 -> { o },
                // 1
                49 -> { 1 }
            };

            // Map of functions for "storing" results.
            // The values are void functions taking an Integer,
            // the keys are the ASCII/Unicode code points of the corresponding
            // source code characters.
            value s = m {
                // a
                97 -> ((I v) => a = v),
                // b
                98 -> ((I v) => b = v),
                // Modification of the memory works by replacing the map with a new one.
                // This is certainly not runtime-efficient, but shorter than importing
                // ceylon.collections.HashMap.
                // A
                65 -> ((I v) => t = m { a->v, *t }),
                // B
                66 -> ((I v) => t = m { b->v, *t }),
                // i
                105 -> ((I v) => i = v),
                // o – output as a character.
                111 -> ((I v) => p.write("``v.character``"))
            };

            // accessor function for the f map
            I h(I v) =>
                    f[v]?.first else x;

            // the main loop, can only be left by exception
            while (0 < 1) {
                (s[g(i)] else x)(h(g(i + 1)) - h(g(i + 2)));
                i += 3;
            }
        }
    } catch (AssertionError e) {
        // abort silently
    }
}

Paŭlo Ebermann

Posted 2015-12-01T03:30:07.433

Reputation: 1 010

I reused part of that code for a "parallel interpreter" trying to find all halting programs. (There are many of them.) (There I used a non-I/O version of Purple, as I/O takes lot of space and is not used in that task.)

– Paŭlo Ebermann – 2015-12-06T18:43:38.873