Self-Interpreting Interpreter

25

8

Based on a comment by George Edison to this question, write the smallest self-interpreting interpreter.

  • You may use the language of your choosing.
  • Empty languages do not count. Your program must be at least two characters long.
  • The program does not need to interpret the entire language, just a Turing-complete subset of language features (that contains the interpreter).
  • Quines don't count.
  • Do not use your language's built-in eval function or equivalent. Same goes for apply, etc.

Hoa Long Tam

Posted 2011-02-06T02:21:12.330

Reputation: 1 902

@ SHiNKiROU: Thanks, I didn't think of that as a test. Updated. – Hoa Long Tam – 2011-02-06T02:57:12.843

Related: Language w/ the smallest interpreter written in itself on Stack Overflow, though are are few (only one?) answers that actually abide by the rules given here.

– dmckee --- ex-moderator kitten – 2011-02-06T03:56:58.113

1Do we have to rewrite that Scheme sexp parser, or can we consider the host language's ok? – J B – 2011-02-06T15:40:23.137

@J B: The language's string processing utilities are fine, including the sexp parser. – Hoa Long Tam – 2011-02-06T20:22:41.493

@HoaLongTam For my- oh, I mean, everyone's sake can you clarify whether a valid answer must use a Turing-complete language or not?

– user8397947 – 2016-06-14T01:13:51.583

See DBFI. The page declares that it is "the shortest known Brainfuck self-interpreter and possibly the shortest self-interpreter amongst all imperative languages".

– Joey Adams – 2011-04-12T01:12:10.147

This could end up being very short if you assume the existence of something similar to eval but not quite the same, for example a function that reads in a string and converts it to program code. – Michael Dickens – 2011-04-23T18:42:29.530

@Michael Dickens: I think that's what my CI answer does. I reluctantly omitted an operator that takes a character and converts it to its corresponding code fragment, but I did add support for defining and joining code fragments. – Joey Adams – 2011-04-23T22:16:21.603

1(Hmm.. I should do something with /usr/bin/cat) what about Turing-completeness? – Ming-Tang – 2011-02-06T02:36:27.277

Answers

19

CI - 260

,(1p0(2d())(41(2d())('#((1p0()(10()(1d,1p$)=)<)$2d,1p$)(40(1d,1c$^)(''(1d,^)('0
'9('0-(,'0'9('0-2p10*+1p$)(!1d)~)$^)(0($)'$(^)'^(&)'&(c)'c(p)'p(d)'d(=)'=(<)'<(
>)'>(~)'~(.)'.(,)',(!)'!(+)'+(-)'-(*)'*(/)'/(%)'%()38p(1p3p0(3d)((2)(3)=p1d1p$)
=)$)~)=)=,2p$&)=)=)<)$$

320 → 260: Push simple character-to-instruction mappings, then fold over them. This halves the code size per case (there are 18 cases), but costs 30 characters to do the folding.

This is another one of my constructed languages, (base interpreter hosted on Gist). It's unique in that the language reifies code fragments. That is, strings of instructions in this stack-based language are used to the same effect as data structures or closures in other languages:

1^      # Push a 1, and "lift" it to be a code literal.
(5 +)   # Define a code literal.
&       # Join the two code literals, forming (1 5 +)
$       # Execute the code literal.

The interpreter builds a code fragment of the entire program before running it, so it can also be considered a compiler. Because of this, stacking the interpreter does not result in exponential run-time overhead.

The interpreter derives all of its operators directly from the host interpreter. However, it does the parsing by itself, so the majority of the code is just sequences that translate characters into their respective code literals. This isn't the same as using eval, but it reveals how dependent any programming language implementation is on the semantics of its host language/architecture.


Language reference:

Get the interpreter here

Blocks

  • ( ... )

    Create a "block", which is effectively a list of instructions with no context. Internally, it could even be machine code.

  • block $

    Call a block. The callee is handed the global stack, which includes the block being called.

  • value ^

    Lift a value. Namely, turn it into a block that pushes that value.

    Example:

       1 ^
    == (1)
    
  • block1 block2 &

    Join two blocks, forming one that runs both in sequence.

    Example:

       (1) (2) &
    == (1 2)
    

Stack manipulation

  • n c

    Copy the nth value of the stack.

    Example:

       5 4 3 2 1 0 3c
    == 5 4 3 2 1 0 3
    
  • n p

    Pluck the nth value of the stack (remove it, and bring it to front).

    Example:

       5 4 3 2 1 0 3p
    == 5 4 2 1 0 3
    
  • n d

    Drop n values from the stack. 0d is a no-op.

    Example:

       5 4 3 2 1 0 3d
    == 5 4 3
    

Relational operators

  • a b (on_true) (on_false) =

    Test if a equals b. Consume all but the first argument, and call on_true or on_false. If one argument is zero and the other is any other type, the result will be false. Otherwise, a and b must be integers.

    Example:

       3 3 ('0+ .) (1d) =
    == 3 '0+ .
    
  • a b (on_true) (on_false) <

    Test if a is less than b. a and b must be integers.

    Example:

       3 5 (1d 5) () <
    == 3 1d 5
    == 5
    
  • a b (on_true) (on_false) >

    Test if a is greater than b. a and b must be integers.

    Example:

       3 5 (1d 5) () >
    == 3
    
  • a lo hi (on_true) (on_false) ~

    Test if lo <= a <= hi. a, lo, and hi must be integers.

    Example:

       3 0 10 ('0+ .) (1d) ~
    == 3 '0+ .
    

I/O

  • c .

    Put the character c (consuming it from the stack).

  • ,

    Get a character and push it to the stack. If the end of file has been reached, -1 is pushed.

  • c !

    Unget a character. Just like ungetc in C, only one pushback is allowed.

Integer literals

  • 'c

    Push the character c.

  • [0-9]+

    Push a decimal integer.

Arithmetic

  • a b +
  • a b -
  • a b *

    Add/subtract/multiply two numbers.

    Example:

       3 5 + 7 3 + *
    == 80
    
  • a b /

  • a b %

    Division and modulus. Unlike in C, these round toward negative infinity.

Miscellaneous

  • code # comment

    The # character comments out everything to the end of the line.

  • )

    Used to end blocks. Can be used to end the whole program, too.

  • All other characters are ignored.

Joey Adams

Posted 2011-02-06T02:21:12.330

Reputation: 9 929

24

Binary Lambda Calculus, 232 bits (29 bytes)

0101000110100000000101011000000000011110000101111110011110000101110011110000001111000010110110111001111100001111100001011110100111010010110011100001101100001011111000011111000011100110111101111100111101110110000110010001101000011010

See http://en.wikipedia.org/wiki/Binary_lambda_calculus#Lambda_encoding for details

John Tromp

Posted 2011-02-06T02:21:12.330

Reputation: 957

2Why isn't this the accepted answer D: BLC is amazing! – cat – 2016-03-18T22:06:28.640

Can you explain it at all? – PyRulez – 2017-06-13T00:48:23.947

1

Unfortunately, Wikipedia removed the Binary Lambda Calculus page. My http://tromp.github.io/cl/cl.html page links to a preserved copy and to a paper I wrote explaining the workings of the interpreter.

– John Tromp – 2017-06-14T07:49:50.270

13

I can't take credit for this one, but I thought I would share this amazing one:

Brainf*** (423)

>>>+[[-]>>[-]++>+>+++++++[<++++>>++<-]++>>+>+>+++++[>++>++++++<<-]+>>>,<++[[>[
->>]<[>>]<<-]<[<]<+>>[>]>[<+>-[[<+>-]>]<[[[-]<]++<-[<+++++++++>[<->-]>>]>>]]<<
]<]<[[<]>[[>]>>[>>]+[<<]<[<]<+>>-]>[>]+[->>]<<<<[[<<]<[<]+<<[+>+<<-[>-->+<<-[>
+<[>>+<<-]]]>[<+>-]<]++>>-->[>]>>[>>]]<<[>>+<[[<]<]>[[<<]<[<]+[-<+>>-[<<+>++>-
[<->[<<+>>-]]]<[>+<-]>]>[>]>]>[>>]>>]<<[>>+>>+>>]<<[->>>>>>>>]<<[>.>>>>>>>]<<[
>->>>>>]<<[>,>>>]<<[>+>]<<[+<<]<]

Peter Olson

Posted 2011-02-06T02:21:12.330

Reputation: 7 412

10

BlockScript - 535

{[B':=?0:B';=?0:B'}=?0:B'{=?,A!,A!d1c&:B'?=?,A!,A!2e&:B''=?,,A!d3c&:B{[B'0<?0:B
'9>?0:1}!?B'0-{[,g!?c'0-B10*d+A!:Bd]A!d3c&}!:B'#=?{[,10=?,]A!:A!}!:,A!Bb&}{[AC[
B]DB?[AB{[Bh&hbhn!}{[B[AB]C?1-eA!:b}&[C1=?E[C]FHc&B!:C2=?{G?D:E[C}!FHcI!:C3=?E[
C]B!:C'!=?G[ABC]Hc&dbh&D?b@I!B!:b@I!:C'&=?HB!:C'@=?FGDI!:C'[=?GF&HDI!:C']=?F[A]
HDI!:C',=?,B!:C'.=?G.FHDI!:C'a'z{[DC<?0:DB>?0:1}!?Ce-HA!B!:C'A'Ze!?F[B]Cg-dA!B!
:{C'+=?{[CB+}:C'-=?{[CB-}:C'*=?{[CB*}:C'/=?{[CB/}:C'%=?{[CB%}:C'<=?{[CB<}:C'>=?
{[CB>}:C'==?{[CB=}:0}!?H[A][B]Ge!B!:FHDI!:c},c!0ac&0&0&0bho!;

BlockScript is a trivial spaghetti stack-based language I created specifically for this challenge. The base interpreter is blockscript.c .

Sample program (prints the first 15 Fibonacci numbers):

{[B?B10/A!B10%d&:0}
{[B0<?'-.0B-A!:{B?Bh!{[B?B[A]A!B[B]'0+.:}!:'0.}!10.}
{[B?Dd!DC+B1-CecA!:}
0 1 15d!
;

The interpreter reads both source code and program input from standard input, in that order. This means that to run an interpreter within an interpreter within an interpreter, simply copy and paste:

# Level 1
{[B':=?0:B';=?0:B'}=?0:B'{=?,A!,A!d1c&:B'?=?,A!,A!2e&:B''=?,,A!d3c&:B{[B'0<?0:B
'9>?0:1}!?B'0-{[,g!?c'0-B10*d+A!:Bd]A!d3c&}!:B'#=?{[,10=?,]A!:A!}!:,A!Bb&}{[AC[
B]DB?[AB{[Bh&hbhn!}{[B[AB]C?1-eA!:b}&[C1=?E[C]FHc&B!:C2=?{G?D:E[C}!FHcI!:C3=?E[
C]B!:C'!=?G[ABC]Hc&dbh&D?b@I!B!:b@I!:C'&=?HB!:C'@=?FGDI!:C'[=?GF&HDI!:C']=?F[A]
HDI!:C',=?,B!:C'.=?G.FHDI!:C'a'z{[DC<?0:DB>?0:1}!?Ce-HA!B!:C'A'Ze!?F[B]Cg-dA!B!
:{C'+=?{[CB+}:C'-=?{[CB-}:C'*=?{[CB*}:C'/=?{[CB/}:C'%=?{[CB%}:C'<=?{[CB<}:C'>=?
{[CB>}:C'==?{[CB=}:0}!?H[A][B]Ge!B!:FHDI!:c},c!0ac&0&0&0bho!;

# Level 2
{[B':=?0:B';=?0:B'}=?0:B'{=?,A!,A!d1c&:B'?=?,A!,A!2e&:B''=?,,A!d3c&:B{[B'0<?0:B
'9>?0:1}!?B'0-{[,g!?c'0-B10*d+A!:Bd]A!d3c&}!:B'#=?{[,10=?,]A!:A!}!:,A!Bb&}{[AC[
B]DB?[AB{[Bh&hbhn!}{[B[AB]C?1-eA!:b}&[C1=?E[C]FHc&B!:C2=?{G?D:E[C}!FHcI!:C3=?E[
C]B!:C'!=?G[ABC]Hc&dbh&D?b@I!B!:b@I!:C'&=?HB!:C'@=?FGDI!:C'[=?GF&HDI!:C']=?F[A]
HDI!:C',=?,B!:C'.=?G.FHDI!:C'a'z{[DC<?0:DB>?0:1}!?Ce-HA!B!:C'A'Ze!?F[B]Cg-dA!B!
:{C'+=?{[CB+}:C'-=?{[CB-}:C'*=?{[CB*}:C'/=?{[CB/}:C'%=?{[CB%}:C'<=?{[CB<}:C'>=?
{[CB>}:C'==?{[CB=}:0}!?H[A][B]Ge!B!:FHDI!:c},c!0ac&0&0&0bho!;

# Level 3
{[B?B10/A!B10%d&:0}
{[B0<?'-.0B-A!:{B?Bh!{[B?B[A]A!B[B]'0+.:}!:'0.}!10.}
{[B?Dd!DC+B1-CecA!:}
0 1 15d!
;

Like the movie Inception, you pretty much can't go any deeper than three levels. It's not a matter of time, but space. BlockScript leaks memory profusely, and this has to do with how the language itself is designed.


Language reference:

Get the interpreter here

In BlockScript, the "stack" is not an array that is overwritten by subsequent operations like you may be used to. It is actually implemented as an immutable linked list, and a stack persists for the duration of the program. Also, no operator (except @) removes values from the stack. However, stack modifications only affect the block in which they occur.

Value selection

  • a through z

    Fetch the 0-25th item from the stack, and push it to the stack. a refers to the head, or most recently pushed item, of the stack.

  • A through Z

    Fetch the 0-25th item of the current frame, and push it to the stack.

  • [

    Open a "frame" to select items from the stack reference (see below) on the head of the stack. [ doesn't require a matching ], but frames are lexically scoped. In BlockScript, "scope" is determined by braces ({ ... }) that form blocks. Thus, opening a frame inside of a block will have no effect on code outside of the block.

  • ]

    Close the current frame, returning to the previous frame (if any).

Blocks

  • { ... }

    Create a "block", and push it to the stack. Inside a block, the stack will start at what it was before the block, except the stack of the caller will be pushed on top. Stacks are persistent and immutable in BlockScript, so blocks are closures. The idiom {[ means open a block, then open a frame to start selecting arguments (using A through Z). The return value of a block is the head of the stack when } is reached.

    Example:

    '3 '2 '1 {[ b. d. f. B. C. D. A! } 'D 'C 'B d!;
    

    This prints 123BCD123DCB123BCD123DCB… . The lowercase letters refer to stack values, while the uppercase letters refer to arguments (because the frame is set to the stack of the caller). A! takes the head of the caller (which is guaranteed to be the block being called) and calls it. If you're wondering why it reverses BCD every other time, it's because B. C. D. pushes those arguments in reverse order right before the block calls itself.

  • !

    Call a block. Push the return value to the stack.

Stack references

  • &

    Create a stack reference, and push it to the stack. Think of this as "super-cons", as it effectively takes every item on the stack and forms a "tuple" out of it. The idiom &[ means that whatever a, b, c referred to before can now be accessed with A, B, C (for the remainder of the block or until ] is encountered).

    In part because & captures more values than it usually needs, BlockScript leaks memory by design.

  • @

    Switch to the stack pointed to by the stack reference a. This operator is rather weird, but the BlockScript self-interpreter uses it a couple times to avoid having to push the same arguments twice. The effects of @ (or any stack operation, for that matter) are confined to the block in which it is invoked. Also, the frame is unaffected by @, so the frame can be used to grab values you need after switching stacks.

Conditional expression

  • ? <on true> : <on false>

    Conditional expression, just like the ternary operator in C. That is, if a is "true" (that is, not equal to the integer zero), then do <on true>, otherwise do <on false>.

I/O

Note: Input and output are done in UTF-8. A "character" is an integer corresponding to a Unicode index.

  • ,

    Get the next character of input, and push it to the stack. If the end of input is reached, push -1 instead.

  • .

    Output the character on the head of the stack.

Integer/character literals

Note: Integers and characters are the same thing in BlockScript.

  • 'c

    Push the character c.

  • [0-9]+

    Push a decimal integer.

Arithmetic

These operators only work on integer values.

  • + Compute b + a (pushing the result, but not discarding either value).
  • - Compute b - a.
  • * Compute b * a.
  • / Compute b / a (integer division; rounds toward negative infinity).
  • % Compute b % a (integer modulus; rounds toward negative infinity).

Relational operators

These operators only work on integer values.

  • < If b is less than a, push 1, else push 0.
  • >
  • =

Miscellaneous

  • # Comment to end of line
  • The program must end with ;
  • All other characters are ignored.

Joey Adams

Posted 2011-02-06T02:21:12.330

Reputation: 9 929

2Christ. Care to share the spec like you did for CI? – Casey – 2011-04-28T01:19:37.213

@Casey: Added a reference. – Joey Adams – 2011-04-28T04:03:14.060

1Would you be interested in knowing that you're dreaming? ...At level 4. – Mateen Ulhaq – 2011-04-30T05:49:34.130

3

Zozotez LISP: 414

linefeeds added to get a nice block is not needed and not counted.

((\(E V A L)(E(r)'(())))(\(x e)(?(s x)(V x e)((\(b j)(?(= b ")(a j)(?(= b \)x
(?(= b ?)(?(E(a j)e)(E(a(d j))e)(E(a(d(d j)))e))(?(s b)(A b(E(a j)e)(E(a(d j)
)e))(E(a(d(d b)))(L(a(d b))j e)))))))(E(a x)e)(d x))))(\(x g)(? g(?(= x(a(a
g)))(d(a g))(V x(d g)))x))(\(f h v)(?(= f r)(r)(?(= f p)(p h)(?(= f s)(s h)(?
(= f a)(a h)(?(= f d)(d h)(?(= f =)(= h v)(c h v))))))))(\(k v i)(? k(L(d k)(
d v)(c(c(a k)(E(a v)i))i))i)))

In theory it should be able to run itself, but since the original interpreter is a BrainFuck binary and itself an interpreter I have only been able to test each part. When given itself and a simple expression (p p) I think it needs more time than the 40 minutes I have waited so far and I'm using my fast jitbf to run it which (mis)uses Perl Inline-C to run C code on the fly.

It's impossible to implement the whole Zozotez in Zozotez since it does not have means of mutating cons and : (setq/define) need that to update bindings. I also didn't implement explicit begin/progn or &rest argument, macros and special printing arguments since I didn't use it in the interpreter. I did include p (print) even though I don't use it so programs need to explicit print their calculations just like the original interpreter.

The same ungolfed:

;; A stand alone Zozotez script need to be
;; contained in one expression, here with
;; all functions provided as arguments to
;; get them bound in the dynamic environment
((\ (E V A L)
  (E(r)'(())))
 ;; E (EVAL)
 (\ (x e)
   (? (s x)
      (V x e)
      ((\ (b j)
         (? (= b ") (a j)
         (? (= b \) x
         (? (= b ?) (? (E(a j)e) (E(a(d j))e) (E(a(d(d j)))e))
         (? (s b)
            (A b (E(a j)e) (E (a(d j))e))
            (E (a(d(d b))) (L(a(d b))j e)))))))
       (E (a x) e)(d x))))
 ;; V (VALUE / symbol->value)
 (\ (x g)
   (? g
      (? (= x (a (a g)))
         (d (a g))
         (V x (d g)))
      x))
 ;; A (APPLY) but just for primitives
 (\ (f h v)
   (? (= f r) (r)
   (? (= f p) (p h)
   (? (= f s) (s h)
   (? (= f a) (a h)
   (? (= f d) (d h)
   (? (= f =)
      (= h v)
      (c h v))))))))
 ;; L ( joint evLis / extend-env)
 (\ (k v i)
   (? k
      (L (d k) 
         (d v)
     (c (c (a k) 
           (E (a v) i)) 
        i))
      i)))

Sylwester

Posted 2011-02-06T02:21:12.330

Reputation: 3 678

0

CHIQRSX9+ (probably non-competing), 2 bytes

+I

There's no way to write a self-interpreting interpreter in this HQ9+-based language without using I, which runs a built-in interpreter that processes STDIN.

user8397947

Posted 2011-02-06T02:21:12.330

Reputation: 1 242

1Nowhere in the rules does it say that built-in self interpreters are disallowed. It says for eval, which is for expressions, not programs. – Erik the Outgolfer – 2016-06-07T16:45:32.330

How does one compute primes in this language? – pppery – 2016-06-13T13:49:23.443

@ppperry X is supposed to make the language Turing-complete (thus able to compute primes) in an implementation-dependent manner. – user8397947 – 2016-06-13T19:50:47.080

Acording to the Esolang page, the Perl interpreter's X command randomly perturbs the program and the executes it, meaning that one cannot use than command to deterministically compute primes. Can you give me of an example interpreter that lets you use X in the way you specified? – pppery – 2016-06-13T20:03:27.417

@ppperry Apparently the interpreter written in Perl is the only interpreter, so no. Also, what if there's a program that computes primes when "randomized" with a specific value? – user8397947 – 2016-06-13T21:28:44.613

The Perl interpreter does not provide any method of customizing the seed ti the RNG, so any such program would only work 1\256 of the time. – pppery – 2016-06-14T00:19:00.890

0

Concurrent Filesystem Befunge 98 - 53\18 bytes (almost certainly cheating)

Full 53 byte interpreter with no restrictions (although I haven't tested complicated timing interactions involving IP splitting and wrapping):

v ;;;;;;;;
>]390'ai@
 t;;;;;;;;
;>zzzzz#;

Reads input from a file named a and executes it. It isn't specifed in the rules that we can't use self-modifying code.

18 byte interpreter that does not allow wrapping (the IP moving of one edge of the code and starting at the opposite edge):

]210'ai@
t
><

pppery

Posted 2011-02-06T02:21:12.330

Reputation: 3 987