Leaderboard - JIT Compiled (Lower is better)

es1024 - 81.2 points (including a working compiler!)

Kieth Randall - 116 points

Ell - 121 points

Leaderboard - Interpreted (Lower is better)

Martin Büttner - 706654 points (somewhere around 2 hours).

criptych - 30379 points (97 seconds)

Your mission, should you choose to accept it, is to write the smallest possible bytecode interpreter/VM. The VM/interpreter uses a small CISC architecture (operations can vary in size), with the language specified below. Upon completion, you must print the value of the 3 CPU registers to prove that the correct output is printed (3,126,900,366).

Compiler

If you'd like to make your own tests, a compiler is posted below. Feel free to post your tests with your answer.

window.compile=function(){var e=$("#text").val().toLowerCase().match(/[^\r\n]+/g);var t=[];for(var n=0;n<e.length;n++)compileLine(e[n],t);var r="";for(var n=0;n<t.length;n++)if(typeof t[n]=="string")r+="\n";else r+="0x"+t[n].toString(16)+" ";$("#compiledArray").val(r)};window.compileLine=function(e,t){var n=e.split(" ");if(n[0]=="load"){t.push(0);t.push(getInt(n[1]));t.pushArr(getInt(n[2]))}if(n[0]=="rload"){t.push(1);t.push(getInt(n[1]));t.push(getInt(n[1]))}if(n[0]=="push"){t.push(2);t.push(getInt(n[1]))}if(n[0]=="pop"){t.push(3);t.push(getInt(n[1]))}if(n[0]=="add"){t.push(4);t.push(getInt(n[1]));t.push(getInt(n[2]))}if(n[0]=="sub"){t.push(5);t.push(getInt(n[1]));t.push(getInt(n[2]))}if(n[0]=="mul"){t.push(6);t.push(getInt(n[1]));t.push(getInt(n[2]))}if(n[0]=="div"){t.push(7);t.push(getInt(n[1]));t.push(getInt(n[2]))}if(n[0]=="jmp"){t.push(8);t.pushArr(getInt(n[1]))}if(n[0]=="cmp"){t.push(9);t.push(getInt(n[1]));t.push(getInt(n[2]))}if(n[0]=="branchlt"){t.push(10);t.pushArr(getInt(n[1]))}if(n[0]=="brancheq"){t.push(11);t.pushArr(getInt(n[1]))}if(n[0]=="branchgt"){t.push(12);t.pushArr(getInt(n[1]))}if(n[0]=="branchne"){t.push(13);t.pushArr(getInt(n[1]))}t.push("NEW LINE")};window.getInt=function(e){if(e.trim().startsWith("<--"))return"COMMENT";if(e=="r0")return 0;if(e=="r1")return 1;if(e=="r2")return 2;if(e.startsWith("0x"))return parseInt(e,16);if(isNaN(parseInt(e)))alert(e);return getIntBytes(parseInt(e))};if(typeof String.prototype.startsWith!="function"){String.prototype.startsWith=function(e){return this.slice(0,e.length)==e}}Array.prototype.pushArr=function(e){this.push.apply(this,e)};window.getIntBytes=function(e){var t=[];var n=4;do{t[--n]=e&255;e=e>>8}while(n);return t}

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js"></script>
<textarea id="text" cols="40" rows="10"></textarea>
<br/>
<button onclick="compile()">Compile</button>
<br/>
<textarea id="compiledArray"  cols="40" rows="10" readonly></textarea>

"VM" Specifications

The VM has 3 32 bit unsigned integral registers: R0, R1, R2. They are represented in hex as 0x00, 0x01, and 0x02.

The following operations must be supported:

The format is [name] [...operands...], [hexadecimal op-code] [...operands repeated...]

LOAD [register] [4 byte value], 0x00 [register] [4 byte value]
PUSH [register], 0x02 [register]
POP [register], 0x03 [register]
ADD [register, 1 byte] [register, 1 byte], 0x04 [register] [register]
SUB [register, 1 byte] [register, 1 byte], 0x05 [register] [register]
MUL [register, 1 byte] [register, 1 byte], 0x06 [register] [register]
DIV [register, 1 byte] [register, 1 byte], 0x07 [register] [register]
JMP [code line, 4 bytes], 0x08 [4 byte code line number]
CMP [register, 1 byte] [register, 1 byte], 0x09 [register] [register]
BRANCHLT [code line, 4 bytes], 0x0a [4 byte code line number]

Some notes:

The above math operations add the values of 2 registers together, placing the output in the first register.
CMP, the comparison operator, should compare the values of 2 registers and store the output in some internal flag (this can be implementation specific) for future use on branch instructions.
If BRANCH is called before CMP, unless BRANCHEQ is called, the "VM" should not branch.
PUSH/POP unsurprisingly push or pop numbers from the stack.
Jump and Branch operators jump to a specific operation (line of code), not a binary address.
Branch operations do not do the comparison. Rather, they take the output from the last comparison to execute.
Branch and Jump operators use a zero based line number indexing system. (E.g. JMP 0 jumps to the first line)
All operations are to be performed on unsigned numbers which overflow to zero and do not throw an exception on an integer overflow.
Division by zero is not allowed and as such, the behavior of the program is not defined. You can (for example)...
- Crash the program.
- End execution of the VM and return it's current state.
- Show a "ERR: Division by 0" message.
Termination of the program is defined as when the instruction pointer reaches the end of the program (a non empty program can be assumed).

Output The output must be exactly this (newlines included)

R0 3126900366
R1 0
R2 10000

Points Points are calculated based on the following formula: Number Of Characters * (Seconds Needed To Run / 2)

To avoid hardware differences causing different times, each test will be run on my computer (i5-4210u, 8GB ram) in either ubuntu server or Windows 8, so try not to use some insane-exotic-runtime that only compiles on a Dual G5 Mac Pro with exactly 762.66 mb of free RAM.

If you are using a specialized runtime/language, please post a link to it.

For interested parties, I have posted the testing code (written in C#) here: http://pastebin.com/WYCG5Uqu

Test Program

The idea came from here, so we'll use a somewhat modified version of their program.

The correct output for the program is: 3,126,900,366

In C:

int s, i, j;
for (s = 0, i = 0; i < 10000; i++) {
    for (j = 0; j < 10000; j++)
        s += (i * j) / 3;
}

In code: [R0 is representative of s, R1 of j, R2 of i]

LOAD R0 0
LOAD R2 0 <--outer loop value
LOAD R1 0 <--inner loop value
     --Begin inner loop--
PUSH R1 <--push inner loop value to the stack
MUL R1 R2 <--(i*j)
PUSH R2
LOAD R2 3
DIV R1 R2 <-- / 3
POP R2
ADD R0 R1 <-- s+=
POP R1
PUSH R2 
LOAD R2 1
ADD R1 R2 <--j++
POP R2
PUSH R2
LOAD R2 10000
CMP R1 R2 <-- j < 10000
POP R2
BRANCHLT 3 <--Go back to beginning inner loop
--Drop To outer loop--
LOAD R1 1
ADD R2 R1 <--i++
LOAD R1 10000
CMP R2 R1 <-- i < 10000
LOAD R1 0 <--Reset inner loop
BRANCHLT 2

In binary/hex:

0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x02 0x00 0x00 0x00 0x00
0x00 0x01 0x00 0x00 0x00 0x00
0x02 0x01
0x06 0x01 0x02
0x02 0x02
0x00 0x02 0x00 0x00 0x00 0x03
0x07 0x01 0x02
0x03 0x02
0x04 0x00 0x01
0x03 0x01
0x02 0x02
0x00 0x02 0x00 0x00 0x00 0x01
0x04 0x01 0x02
0x03 0x02
0x02 0x02
0x00 0x02 0x00 0x00 0x27 0x10
0x09 0x01 0x02
0x03 0x02
0x0a 0x00 0x00 0x00 0x03
0x00 0x01 0x00 0x00 0x00 0x01
0x04 0x02 0x01
0x00 0x01 0x00 0x00 0x27 0x10
0x09 0x02 0x01
0x00 0x01 0x00 0x00 0x00 0x00
0x0a 0x00 0x00 0x00 0x02

Bonus Points (Effects are applied multiplicatively) E.g. if you qualify for all three, it would be ((characters * 0.50) * 0.75) * 0.90

50% decrease if the interpreter is actually a JIT compiler
25% decrease if it applies any sort of loop unrolling/meaningful optimization.
10% decrease if you extend the VM with
- BRANCHEQ [code line, 4 bytes] (Branch if equal - opcode 0x0b)
- BRANCHGT [code line, 4 bytes] (Branch if greater than - opcode 0x0c)
- BRANCHNE [code line, 4 bytes] (Branch if not equal - opcode 0x0d)
- RLOAD [register 1] [register 2] (move the value of register 2 to register 1 - opcode 0x01).

Disallowed

Precompiling the test case into the program is prohibited. You must either accept the bytecode from STDIN or from a file (Doesn't matter which).
Returning the output without running the program.
Any other way you can think of to cheat the VM requirement.

Colorfully Monochrome

Posted 2014-11-23T04:18:02.753

Reputation: 273

Why not include a few more test programs to discourage the things you said are disallowed? If it's a VM, it should be able to run anything written in the bytecode specification, right? – Kasran – 2014-11-23T04:23:15.807

I will try to do that tonight. I am writing the compiler right now. – Colorfully Monochrome – 2014-11-23T04:31:14.523

The compiler is here http://jsfiddle.net/aL9y19bd/23/

– Colorfully Monochrome – 2014-11-23T05:35:46.903

1Does CMP check for less than or equality? And what happens to its result? – es1024 – 2014-11-23T05:38:13.397

1MUL and DIV are also underspecified. Should they be signed or unsigned? What happens on multiplication overflow? – feersum – 2014-11-23T05:44:45.070

As for the BRANCH** opcodes, what is compared (value from a register, value from stack, something else)? – es1024 – 2014-11-23T05:54:32.063

Actually that's called multiplicative stacking for the bonuses, not linear (or additive) – Claudiu – 2014-11-23T06:18:25.130

@Claudiu There - fixed. Thanks for pointing that out; I thought linear wasn't the right word. – Colorfully Monochrome – 2014-11-23T06:20:25.610

@es1024 CMP compares the value of 2 registers and stores the result internally so it can do BRANCH** operations. BRANCH** opcodes do not actually do a comparison. They use the result of the previous CMP operation to decide whether to jump or not. - I somewhat modeled this after x86 assembly. – Colorfully Monochrome – 2014-11-23T06:20:49.030

@feersum Math is unsigned and on an integer overflow, overflow to zero rather than crashing. – Colorfully Monochrome – 2014-11-23T06:21:12.547

A few notes: (a) The second and third lines in the binary test program are swapped. (b) The final value of R0 is 3126900366 (which is -1168066930 as a signed value.) – Ell – 2014-11-24T00:03:10.160

@Ell I just spent an hour or two on finding (a). Now my short but slow solution will look really bad next to yours. :D There's also 3 redundant operations in the bytecode. There's POP R2 PUSH R2 and the LOAD R1 0 at the end. – Martin Ender – 2014-11-24T00:04:54.773

What would be considered as a meaningful optimization? – es1024 – 2014-11-24T02:25:30.493

@es1024 Sorry. That was also a bit vague. A "meaningful optimization" is defined as any change from the code provided to the code executed that measurably improves performance. For example, detecting that 2 of the operations, in a row, are "POP R2" and "PUSH R2" and removing them because they do not change the behavior of the program would be an optimization. So, optimization could be dead code detection, loop unrolling, etc. – Colorfully Monochrome – 2014-11-24T03:01:04.417

@Ell - Thanks, I hadn't realized. I have updated the post to reflect this. – Colorfully Monochrome – 2014-11-24T03:01:27.197

@MartinBüttner That was on purpose. Have to provide some room for optimization...and your answer is still awesome. I don't think I could write a hello world program in less than 250 characters, much less an entire VM. – Colorfully Monochrome – 2014-11-24T03:02:37.207

Answers

C, 752 (589+163 for define flags) * 0.5 (JIT) * 0.9 (extensions) * (0.75 optimization) * (0.64 seconds/2) = 81.216

C[S],J[S],i,j,k,y,c,X,Y,Z;char*R,a,b[9];main(x){R=mmap(0,S,6,34,-1,0);N=85;while(scanf("%c%s%*[ R]%d%*[ R]%d",&a,b,&x,&y)&&~getchar())a-=65,a-1?a-15?a-9?a?a-2?a-3?a-11?a-12?a-17?(N=41,v):(N=137,v):(N=137,u,N=247,g(H,4),N=139,u):(y?N=189+x,s(y):(N=51,g(G,G))):(N=137,u,N=247,g(H,6),N=139,u):(N=57,v,s(0xFC8A9F),--j):(N=1,v):(N=233,J[k++]=i,s(x)):b[1]-80?N=85+x:(N=93+x):(c=b[5],s(0x0F9EE78A),N=(c-69?c-71?c-76?1:8:11:0)+132,J[k++]=i,s(x)),C[++i]=j;U(E8,X)U(F0,Y)U(F8,Z)s(50013);i=j;while(k--)j=C[J[k]]+1,R[j-1]-233&&(j+=4),s(C[*(int*)(R+j)]-j-4);((int(*)())R)();printf("%u %u %u\n",X,Y,Z);}

Takes code (LOAD R0, etc), no trailing characters, single space, no blank lines in the middle, no comments, etc. Trailing newline required.

This is then converted to 80386 bytecode and executed.

Loading 0 into a register is replaced by xoring the register with itself instead of moving 0 into the register, which is three bytes shorter in the generated bytecode, and may be very marginally faster.

Compile with:

gcc -m32 -D"g(a,b)=(N=192|b<<3|a)"-D"s(b)=(*(int*)(R+j)=b,j+=4)"-DN=R[j++]-D"G=((x+1)|4)"
-D"H=((y+1)|4)"-DS=9999-D"u=g(0,G)"-D"v=g(G,H)"-D"U(b,c)=s(0xA3##b##89),--j,s(&c);"
bytecode.c -o bytecode

POSIX-compliant OS required.

Input is read from STDIN (use ./bytecode < file to pipe from a file).

Resulting bytecode for the test program:

; start
 0:   55                      push   %ebp
; LOAD R0 0
 1:   33 ed                   xor    %ebp,%ebp
; LOAD R2 0
 3:   33 ff                   xor    %edi,%edi
; LOAD R1 0
 5:   33 f6                   xor    %esi,%esi
; PUSH $1
 7:   56                      push   %esi
; MUL R1 R2
 8:   89 f0                   mov    %esi,%eax
 a:   f7 e7                   mul    %edi
 c:   8b f0                   mov    %eax,%esi
; PUSH R2
 e:   57                      push   %edi
; LOAD R2 3
 f:   bf 03 00 00 00          mov    $0x3,%edi
; DIV R1 R2
14:   89 f0                   mov    %esi,%eax
16:   f7 f7                   div    %edi
18:   8b f0                   mov    %eax,%esi
; POP R2
1a:   5f                      pop    %edi
; ADD R0 R1
1b:   01 f5                   add    %esi,%ebp
; POP R1
1d:   5e                      pop    %esi
; PUSH R2
1e:   57                      push   %edi
; LOAD R2 1
1f:   bf 01 00 00 00          mov    $0x1,%edi
; ADD R1 R2
24:   01 fe                   add    %edi,%esi
; POP R2
26:   5f                      pop    %edi
; PUSH R2
27:   57                      push   %edi
; LOAD R2 10000
28:   bf 10 27 00 00          mov    $0x2710,%ed
; CMP R1 R2
2d:   39 fe                   cmp    %edi,%esi
2f:   9f                      lahf
30:   8a fc                   mov    %ah,%bh
; POP R2
32:   5f                      pop    %edi
; BRANCHLT 3
33:   8a e7                   mov    %bh,%ah
35:   9e                      sahf
36:   0f 8c cb ff ff ff       jl     0x7
; LOAD R1 1
3c:   be 01 00 00 00          mov    $0x1,%esi
; ADD R2 R1
41:   01 f7                   add    %esi,%edi
; LOAD R1 10000
43:   be 10 27 00 00          mov    $0x2710,%es
; CMP R2 R1
48:   39 f7                   cmp    %esi,%edi
4a:   9f                      lahf
4b:   8a fc                   mov    %ah,%bh
; LOAD R1 0
4d:   33 f6                   xor    %esi,%esi
; BRANCHLT 2
4f:   8a e7                   mov    %bh,%ah
51:   9e                      sahf
52:   0f 8c ad ff ff ff       jl     0x5
; copy R0 to X
58:   89 e8                   mov    %ebp,%eax
5a:   a3 28 5b 42 00          mov    %eax,0x425b
; copy R1 to Y
5f:   89 f0                   mov    %esi,%eax
61:   a3 38 55 44 00          mov    %eax,0x4455
; copy R2 to Z
66:   89 f8                   mov    %edi,%eax
68:   a3 40 55 44 00          mov    %eax,0x4455
; exit
6d:   5d                      pop    %ebp
6e:   c3                      ret

Ungolfed:

C[9999],J[9999],i,j,k,y,c,X,Y,Z;
char *R,a,b[9];
main(x){
    // 6 is PROC_WRITE|PROC_EXEC
    // 34 is MAP_ANON|MAP_PRIVATE
    R=mmap(0,'~~',6,34,-1,0);

    N=0x55;
    while(scanf("%c%s%*[ R]%d%*[ R]%d",&a,b,&x,&y)&&~getchar())
        a-=65,
        a-1? // B[RANCH**]
            a-15? // P[USH/OP]
                a-9? // J[MP]
                    a? // A[DD]
                        a-2? // C[MP]
                            a-3? // D[IV]
                                a-11? // L[OAD]
                                    a-12? // M[UL]
                                        a-17? // R[LOAD]
                                            // SUB
                                            (N=0x29,g(G,H))
                                        :(N=0x89,g(G,H))
                                    :(N=0x89,g(0,G),N=0xF7,g(H,4),N=0x8B,g(0,G))
                                :(y?N=0xBD+x,s(y):(N=0x33,g(G,G)))
                            :(N=0x89,g(0,G),N=0xF7,g(H,6),N=0x8B,g(0,G))
                        :(N=0x39,g(G,H),s(0xfc8a9f),--j)
                    :(N=0x1,g(G,H))
                :(N=0xE9,J[k++]=i,s(x))
            :b[1]-80? 
                N=0x55+x // PUSH
            :(N=0x5D+x) // POP
        :(c=b[5],s(0x0f9ee78a),N=(
        c-69? // EQ
            c-71? // GT
                c-76? // LT
                    1 // NE
                :8
            :11
        :0
        )+0x84,J[k++]=i,s(x)),
        C[++i]=j
        ;
    // transfer registers to X,Y,Z
    s(0xA3E889),--j,s(&X);
    s(0xA3F089),--j,s(&Y);
    s(0xA3F889),--j,s(&Z);

    // pop and ret
    s(0xC35D);

    i=j;
    // fix distances for jmp/branch**
    while(k--)
        j=C[J[k]]+1,R[j-1]-0xE9&&(j+=4),
        s(C[*(int*)(R+j)]-j-4);

    // call
    ((int(*)())R)();

    // output
    printf("%u %u %u\n",X,Y,Z);
}

es1024

Posted 2014-11-23T04:18:02.753

Reputation: 8 953

Wow. I wish I would have added a bonus for including the compiler in the VM. – Colorfully Monochrome – 2014-11-24T03:56:19.747

Average of .67 seconds per run over 15 runs. – Colorfully Monochrome – 2014-11-24T04:00:43.267

I have to disagree that the xoring is an optimization. While clever code size wise, xoring does not change the performance characteristics of VM (correct me if I'm wrong). What I meant by optimization was changing or removing instructions from the input code (e.g. removing the redundant POP...PUSH) or realizing 2 instructions in a row load the register, so one can be removed, etc. – Colorfully Monochrome – 2014-11-24T04:06:27.223

EDIT: Actually, that is an optimization: it dropped to 0.64 seconds per run over 15 runs. I guess it prevents cache thrashing or something by shortening the code (or removes redundant memory accesses)? – Colorfully Monochrome – 2014-11-24T04:08:28.993

@ColorfullyMonochrome Some architectures, when presented with the xoring of a register to itself, will not actually execute the instruction, but simply zero the register itself. – es1024 – 2014-11-24T04:34:49.810

Oh, that makes sense. And, as the load 0's are being called very often, it provides a meaningful difference. – Colorfully Monochrome – 2014-11-24T04:59:26.933

It's actually 4 times faster: http://stackoverflow.com/a/18027854/995714. It doesn't actually execute an opcode at all, it seems.

– Colorfully Monochrome – 2014-11-24T05:15:31.697

C, Score = 854_byte × (~0.8_sec / 2) × 0.5 [JIT] × 0.9 [Extensions] = ~154_{byte sec}

#define G getchar()
#define L for(i=0;i<3;++i)
#define N*(int*)
#define M(x)"P\x8a\xe7\x9e\xf"#x"    KL"
*T[1<<20],**t=T,*F[1<<20],**f=F,R[3],r[]={1,6,7};char*I[]={"L\xb8    GGJH","I\x8b\xc0HHGH","H\x50GG","H\x58GG","I\3\xc0HHGH","I\53\xc0HHGH","M\x8b\xc0\xf7\xe0\x8b\xc0IHLGJ","O\63\xd2\x8b\xc0\xf7\xf0\x8b\xc0IJNGL","L\xe9    KH","L\73\xc0\x9f\x8a\xfcHHGH",M(\x82),M(\x84),M(\x87),M(\x85)},C[1<<24],*c=C;main(i,o,l,g){N c=0xb7ec8b60;c[4]=70;c+=5;while((o=G)>=0){char*s=I[o];l=*s-'G';memcpy(c,s+1,l);for(s+=l+1;o=*s++;){o-='G';if(o<3){g=r[G];c[*s++-'G']|=g<<3*(o&1);if(o>1)c[*s++-'G']|=g<<3;}else{if(o>3)*f++=c+*s-'G';for(i=4;i;--i)c[*s-'G'+i-1]=G;++s;}}*t++=c;c+=l;}*t=c;while(f>F)--f,**f=(int)T[**f]-(int)*f-4;L N&c[7*i]=0x5893e|r[i]<<19,N&c[3+7*i]=R+i;N&c[21]=0xc361e58b;mprotect((int)C>>12<<12,1<<24,7);((void(*)())C)();L printf("R%d %u\n",i,R[i]);}

Compile with gcc vm.c -ovm -m32 -w on a x86 POSIX compliant OS.
Run with ./vm < program, where program is a binary program file.

Going for the speed. The program performs a pretty straight-forward translation of the input program to x86 machine code and lets the CPU do the rest.

For instance, here's the translation of the test program. ecx, esi and edi correspond to R0, R1 and R2, respectively; bh holds the status flags; eax and edx are scratch registers; the call-stack corresponds to the VM's stack:

# Prologue
     0:   60                      pusha
     1:   8b ec                   mov    ebp,esp
     3:   b7 46                   mov    bh,0x46
# LOAD R0 0
     5:   b9 00 00 00 00          mov    ecx,0x0
# LOAD R2 0 <--outer loop value
     a:   bf 00 00 00 00          mov    edi,0x0
# LOAD R1 0 <--inner loop value
     f:   be 00 00 00 00          mov    esi,0x0
#      --Begin inner loop--
# PUSH R1 <--push inner loop value to the stack
    14:   56                      push   esi
# MUL R1 R2 <--(i*j)
    15:   8b c6                   mov    eax,esi
    15:   f7 e7                   mul    edi
    19:   8b f0                   mov    esi,eax
# PUSH R2
    1b:   57                      push   edi
# LOAD R2 3
    1c:   bf 03 00 00 00          mov    edi,0x3
# DIV R1 R2 <-- / 3
    21:   33 d2                   xor    edx,edx
    23:   8b c6                   mov    eax,esi
    25:   f7 f7                   div    edi
    27:   8b f0                   mov    esi,eax
# POP R2
    29:   5f                      pop    edi
# ADD R0 R1 <-- s+=
    2a:   03 ce                   add    ecx,esi
# POP R1
    2c:   5e                      pop    esi
# PUSH R2
    2d:   57                      push   edi
# LOAD R2 1
    2e:   bf 01 00 00 00          mov    edi,0x1
# ADD R1 R2 <--j++
    33:   03 f7                   add    esi,edi
# POP R2
    35:   5f                      pop    edi
# PUSH R2
    36:   57                      push   edi
# LOAD R2 10000
    37:   bf 10 27 00 00          mov    edi,0x2710
# CMP R1 R2 <-- j < 10000
    3c:   3b f7                   cmp    esi,edi
    3e:   9f                      lahf
    3f:   8a fc                   mov    bh,ah
# POP R2
    41:   5f                      pop    edi
# BRANCHLT 4 <--Go back to beginning inner loop
    42:   8a e7                   mov    ah,bh
    44:   9e                      sahf
    45:   0f 82 c9 ff ff ff       jb     0x14
# --Drop To outer loop--
# LOAD R1 1
    4b:   be 01 00 00 00          mov    esi,0x1
# ADD R2 R1 <--i++
    50:   03 fe                   add    edi,esi
# LOAD R1 10000
    52:   be 10 27 00 00          mov    esi,0x2710
# CMP R2 R1 <-- i < 10000
    57:   3b fe                   cmp    edi,esi
    59:   9f                      lahf
    5a:   8a fc                   mov    bh,ah
# LOAD R1 0 <--Reset inner loop
    5c:   be 00 00 00 00          mov    esi,0x0
# BRANCHLT 3
    61:   8a e7                   mov    ah,bh
    63:   9e                      sahf
    64:   0f 82 a5 ff ff ff       jb     0xf
# Epilogue
    6a:   3e 89 0d 60 ac 04 09    mov    DWORD PTR ds:0x904ac60,ecx
    71:   3e 89 35 64 ac 04 09    mov    DWORD PTR ds:0x904ac64,esi
    78:   3e 89 3d 68 ac 04 09    mov    DWORD PTR ds:0x904ac68,edi
    7f:   8b e5                   mov    esp,ebp
    81:   61                      popa
    82:   c3                      ret

Ungolfed

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <sys/mman.h>

#define MAX_INST_COUNT (1 << 20)
#define MAX_CODE_SIZE (8 * MAX_INST_COUNT)
#define PAGE_SIZE (1 << 12)

uintptr_t *targets[MAX_INST_COUNT], **targets_ptr = targets;
int32_t *fixups[MAX_INST_COUNT], **fixups_ptr = fixups;
char code[MAX_CODE_SIZE], *code_ptr = code;
uint32_t R[3];

const char* inst_defs[] = {
    /* LOAD     */ "\5\xb8    \1\0\4\1",
    /* RLOAD    */ "\2\x8b\xc0\2\1\1\1",
    /* PUSH     */ "\1\x50\1\0",
    /* POP      */ "\1\x58\1\0",
    /* ADD      */ "\2\x03\xc0\2\1\1\1",
    /* SUB      */ "\2\x2b\xc0\2\1\1\1",
    /* MUL      */ "\6\x8b\xc0\xf7\xe0\x8b\xc0\3\1\5\1\3",
    /* DIV      */ "\x8\x33\xd2\x8b\xc0\xf7\xf0\x8b\xc0\3\3\7\1\5",
    /* JMP      */ "\5\xe9    \5\1",
    /* CMP      */ "\5\x3b\xc0\x9f\x8a\xfc\2\1\1\1",
    /* BRANCHLT */ "\x9\x8a\xe7\x9e\x0f\x82    \5\5",
    /* BRANCHEQ */ "\x9\x8a\xe7\x9e\x0f\x84    \5\5",
    /* BRANCHGT */ "\x9\x8a\xe7\x9e\x0f\x87    \5\5",
    /* BRANCHNE */ "\x9\x8a\xe7\x9e\x0f\x85    \5\5"
};

int main() {
    int i;

    {
        const char prologue[] =
            /* PUSHAD       */ "\x60"
            /* MOV EBP, ESP */ "\x8b\xec"
            /* MOV BH, 46h  */ "\xb7\x46"
        ;
        memcpy(code_ptr, prologue, sizeof(prologue) - 1);
        code_ptr += sizeof(prologue) - 1;
    }

    {
        const char reg[] = {1, 6, 7};
        char r;
        int op;
        while ((op = getchar()) != EOF) {
            const char* def = inst_defs[op];
            int len = def[0];
            memcpy(code_ptr, def + 1, len);
            for (def += len + 1; *def; ) {
                switch (*def++) {
                    case 1: code_ptr[*def++] |= reg[getchar()]; break;
                    case 2: code_ptr[*def++] |= reg[getchar()] << 3; break;
                    case 3:
                        r = reg[getchar()];
                        code_ptr[*def++] |= r; code_ptr[*def++] |= r << 3;
                        break;
                    case 5: *fixups_ptr = code_ptr + *def; ++fixups_ptr;
                    case 4:
                        for (i = 4; i; --i) code_ptr[*def + i - 1] = getchar();
                        ++def;
                        break;
                }
            }
            *targets_ptr = code_ptr; ++targets_ptr;
            code_ptr += len;
        }
        *targets_ptr = code_ptr; ++targets_ptr;

        while (fixups_ptr != fixups) {
            --fixups_ptr;
            **fixups_ptr = (char*)targets[**fixups_ptr] - (char*)*fixups_ptr - 4;
        }
    }

    {
        const char epilogue[] =
            /* MOV R[0], ECX */ "\x3e\x89\x0d    "
            /* MOV R[1], ESI */ "\x3e\x89\x35    "
            /* MOV R[2], EDI */ "\x3e\x89\x3d    "
            /* MOV ESP, EBP  */ "\x8b\xe5"
            /* POPAD         */ "\x61"
            /* RET           */ "\xc3"
        ;
        memcpy(code_ptr, epilogue, sizeof(epilogue) - 1);
        for (i = 0; i < 3; ++i) *(uintptr_t*)&code_ptr[3 + 7 * i] = &R[i];
        code_ptr += sizeof(epilogue) - 1;
    }

    mprotect(
        (uintptr_t)code / PAGE_SIZE * PAGE_SIZE, MAX_CODE_SIZE,
        PROT_READ | PROT_WRITE | PROT_EXEC
    );
    { void (*program)(void) = code; program(); }
    for (i = 0; i < 3; ++i) printf("R%d %u\n", i, R[i]);

    return 0;
}

Ell

Posted 2014-11-23T04:18:02.753

Reputation: 7 317

Wow...my JIT was ~900 lines of code (written in c++)... – Colorfully Monochrome – 2014-11-24T02:47:46.913

Average of 0.63 seconds per run for 15 runs. – Colorfully Monochrome – 2014-11-24T03:08:45.953

CJam, 222 187 185 bytes * (too slow / 2)

I just wanted to see how short I can get a bytecode VM by writing it in CJam. Less than 200 bytes seems pretty decent. It's damn slow though, because CJam itself is interpreted. It takes ages to run the test program.

304402480 6b:P;q:iD-);{(_P=@/(\L*@@+\}h;]:P;TTT]:R;{_Rf=~}:Q;{4G#%R@0=@t:R;}:O;{TP=("R\(\GG*bt:R;  ~R= R\~@t:R; Q+O Q4G#+-O Q*O Q/O ~(:T; Rf=~-:U; GG*bU0<{(:T}*;"S/=~T):TP,<}g3,{'R\_S\R=N}/

To run it, download the Java interpreter at this sourceforge link, save the code in vm.cjam and run it with

java -jar cjam-0.6.2.jar vm.cjam

The program expects the bytecode on STDIN. I haven't found a way yet to pipe binary data into a program, without PowerShell adding a trailing line break and converting 0x0a to 0x0d 0x0a, which is really annoying. The code includes 4 bytes to fix that (D-);), which I haven't included in the total count, because it's not something the program should need to do if it actually received the bytecode itself on STDIN, instead of some weirdly encoded version of it. If someone knows a fix for that, please let me know.

Slightly ungolfed:

304402480 6b:P; "Create lookup table for instruction sizes. Store in P.";
q:i             "Read program and convert bytes to integers.";
D-);            "Remove spurious carriage returns. This shouldn't be necessary.";
{(_P=@/(\L*@@+\}h;]:P; "Split into instructions. Store in P.";
"We'll use T for the instruction pointer as it's initialised to 0.";
"Likewise, we'll use U for the CMP flag.";
TTT]:R; "Store [0 0 0] in R for the registers.";
{_Rf=~}:Q; "Register lookup block.";
{4G#%R@0=@t:R;}:O; "Save in register block.";
{TP=("R\(\GG*bt:R;

~R=
R\~@t:R;
Q+O
Q4G#+-O
Q*O
Q/O
~(:T;
Rf=~-:U;
GG*bU0<{(:T}*;"N/=~T):TP,<}g "Run program.";
3,{'R\_S\R=N}/

I'll add a proper explanation tomorrow.

In short, I'm storing all the registers, the instruction pointer and the comparison flag in variables, so that I can keep CJam's stack free to use as the VM's stack.

Martin Ender

Posted 2014-11-23T04:18:02.753

Reputation: 184 808

Let us continue this discussion in chat.

– Martin Ender – 2014-11-25T00:49:25.800

115.279 seconds average for 20 iterations. - 15 tests. That means 2.12208333 hours per test. – Colorfully Monochrome – 2014-11-25T01:26:09.310

python/c++, score=56.66

1435 chars * .234/2 seconds * .5 [JIT] * .75 [Optimization] * .90 [Extra instructions]

Compiles the input program to c++, runs gcc on it, then runs the result. Most of the time is spent inside gcc.

The one optimization I do is reduce the stack operations to explicit variables if it is allowed semantically. It helps a lot, about 10x better runtime of the compiled code (about .056 sec to actually run the resulting binary). I'm not sure what gcc is doing that gets you that improvement, but it's good.

import sys,os
x=map(ord,sys.stdin.read())
w=lambda x:(x[0]<<24)+(x[1]<<16)+(x[2]<<8)+x[3]
I=[]
while x:
 if x[0]==0:f='r%d=%d'%(x[1],w(x[2:]));n=6
 if x[0]==1:f='r%d=r%d'%(x[1],x[2]);n=3
 if x[0]==2:f='P%d'%x[1];n=2
 if x[0]==3:f='O%d'%x[1];n=2
 if x[0]==4:f='r%d=r%d+r%d'%(x[1],x[1],x[2]);n=3
 if x[0]==5:f='r%d=r%d-r%d'%(x[1],x[1],x[2]);n=3
 if x[0]==6:f='r%d=r%d*r%d'%(x[1],x[1],x[2]);n=3
 if x[0]==7:f='r%d=r%d/r%d'%(x[1],x[1],x[2]);n=3
 if x[0]==8:f='goto L%d'%w(x[1:]);n=5
 if x[0]==9:f='a=r%d;b=r%d'%(x[1],x[2]);n=3
 if x[0]==10:f='if(a<b)goto L%d'%w(x[1:]);n=5
 if x[0]==11:f='if(a==b)goto L%d'%w(x[1:]);n=5
 if x[0]==12:f='if(a>b)goto L%d'%w(x[1:]);n=5
 if x[0]==13:f='if(a!=b)goto L%d'%w(x[1:]);n=5
 I+=[f];x=x[n:]
D=[]
d=0
for f in I:D+=[d];d+='P'==f[0];d-='O'==f[0]
J=[]
if all(d==D[int(f[f.find('L')+1:])]for f,d in zip(I,D)if f[0]in'gi'):
 H='uint32_t '+','.join('s%d'%i for i in range(max(D)))+';'
 for f,d in zip(I,D):
  if f[0]=='P':f='s%d=r'%d+f[1:]
  if f[0]=='O':f='r'+f[1:]+'=s%d'%(d-1)
  J+=[f]
else:
 H='std::vector<uint32_t>s;'
 for f,d in zip(I,D):
  if f[0]=='P':f='s.push_back(r'+f[1:]+')'
  if f[0]=='O':f='r'+f[1:]+'=s.back();s.pop_back()'
  J+=[f]
P='#include<vector>\n#include<cstdint>\nuint32_t r0,r1,r2,a,b;'+H+'int main(){'
for i,f in enumerate(J):P+='L%d:'%i+f+';'
P+=r'printf("R0 %u\nR1 %u\nR2 %u\n",r0,r1,r2);}'
c=open("t.cc", "w")
c.write(P)
c.close()
os.system("g++ -O1 t.cc")
os.system("./a.out")

Could certainly be golfed some more.

Keith Randall

Posted 2014-11-23T04:18:02.753

Reputation: 19 865

Average of .477 seconds per run over 15 runs. – Colorfully Monochrome – 2014-11-24T23:25:42.153

Lua 5.2 (or LuaJIT), 740 bytes

First try, only minimally golfed. This version works (at least on the test program), and implements the extra opcodes, but doesn't uphold the unsigned math requirement and isn't particularly fast. As a bonus, though, it's a VM running in a VM, and is written such that it can be either interpreted (run with PUC-Lua) or sort-of-JIT (run with LuaJIT; still interpreted, but the interpreter is now JITted).

EDIT: Golfed better, still big.

EDIT: Fixed a major error, and now restricts arithmetic to unsigned long range. Somehow managed to keep the size from getting out of hand, though, but it's still giving the wrong answer.

EDIT: Turns out, the result was correct but the output wasn't. Switched to printing with %u instead of %d and all is well. Also switched out table-based registers for variables to improve the size and speed somewhat.

EDIT: Using Lua 5.2's goto statement (also available in LuaJIT) I've replaced the interpreter with "JIT-to-Lua," generating code which is run directly by the Lua VM itself. Not sure if this really counts as JIT, but it does improve the speed.

U,S,P,F=table.unpack,table.insert,table.remove,math.floor X,r0,r1,r2,p,m,s=2^32,0,0,0,1,0,{}C={{'r%u=%u',1,4},{'r%u=r%u',1,1},{'S(s,r%u)',1},{'r%u=P(s)',1},{'r%u=(r%u+r%u)%%X',1,0,1},{'r%u=(r%u-r%u)%%X',1,0,1},{'r%u=(r%u*r%u)%%X',1,0,1},{'r%u=F(r%u/r%u)%%X',1,0,1},{'goto L%u',4},{'m=r%u-r%u',1,1},{'if m<0 then goto L%u end',4},{'if m==0 then goto L%u end',4},{'if m>0 then goto L%u end',4},{'if m~=0 then goto L%u end',4}}t={io.open(arg[1],'rb'):read('*a'):byte(1,-1)}i,n,r=1,0,{}while i<=#t do c,i,x,a=C[t[i]+1],i+1,0,{}for j=2,#c do y=c[j]if y>0 then x=0 for k=1,y do i,x=i+1,x*256+t[i]end end S(a,x)end S(r,('::L%d::'):format(n))n=n+1 S(r,c[1]:format(U(a)))end load(table.concat(r,' '))()print(('R0 %u\nR1 %u\nR2 %u'):format(r0,r1,r2))

Here's the original, readable version.

U,S,P,F=table.unpack,table.insert,table.remove,math.floor

X,r0,r1,r2,p,m,s=2^32,0,0,0,1,0,{}

C={
    {'r%u=%u',1,4},
    {'r%u=r%u',1,1},
    {'S(s,r%u)',1},
    {'r%u=P(s)',1},
    {'r%u=(r%u+r%u)%%X',1,0,1},
    {'r%u=(r%u-r%u)%%X',1,0,1},
    {'r%u=(r%u*r%u)%%X',1,0,1},
    {'r%u=F(r%u/r%u)%%X',1,0,1},
    {'goto L%u',4},
    {'m=r%u-r%u',1,1},
    {'if m<0 then goto L%u end',4},
    {'if m==0 then goto L%u end',4},
    {'if m>0 then goto L%u end',4},
    {'if m~=0 then goto L%u end',4},
}

t={io.open(arg[1],'rb'):read('*a'):byte(1,-1)}
i,n,r=1,0,{}
while i<=#t do
    c,i,x,a=C[t[i]+1],i+1,0,{}
    for j=2,#c do
        y=c[j]
        if y>0 then
            x=0 
            for k=1,y do 
                i,x=i+1,x*256+t[i]
            end 
        end
        S(a,x)
    end
    S(r,('::L%d::'):format(n)) 
    n=n+1
    S(r,c[1]:format(U(a)))
end
load(table.concat(r,' '))()
print(('R0 %u\nR1 %u\nR2 %u'):format(r0,r1,r2))

criptych stands with Monica

Posted 2014-11-23T04:18:02.753

Reputation: 181

When I ran your program, I got the following error: http://pastebin.com/qQBD7Rs8 . Do you expect the bytecode over stdin or as a file?

– Colorfully Monochrome – 2014-11-24T23:29:11.217

Sorry. My binary for windows was corrupted. Thus, all the gcc/linux versions worked but the windows tests all crashed. However, it is still reporting that R0 and R1 are 0, while R2 is 1. – Colorfully Monochrome – 2014-11-24T23:44:07.237

I suspect it isn't actually executing: it took 33.8 ms to run on average (GCC takes ~.25 seconds). – Colorfully Monochrome – 2014-11-24T23:53:56.897

The script expects bytecode as a file, with the filename passed on the command line. You're right though, I traced it and it looks like it's only doing the first outer loop. Back to the drawing board... – criptych stands with Monica – 2014-11-25T13:03:13.827

That's what I get for thinking in C and writing in Lua: I used < in my loops instead of <=, so the final branch instruction was left off. It still gets the wrong answer, but now takes a few minutes to do it. :) – criptych stands with Monica – 2014-11-25T13:09:28.140

Worked fine on my end - Took an average of 96.718 seconds over 15 tests. – Colorfully Monochrome – 2014-11-25T18:01:14.847

I've updated it since then -- yes, it finally gives the correct output. Notes above. – criptych stands with Monica – 2014-11-25T18:39:37.030

C#

~~1505~~ 1475 bytes

This is my version of the interpreter, written in C# could be optimized / golfed more I think, but I don't really know where ;)

golfed version:

using System;using System.Collections.Generic;using System.IO;using System.Linq;class M{static void Main(string[]a){if(a.Length==1&&File.Exists(a[0])){B.E(B.P(File.ReadAllLines(a[0])));Console.WriteLine(B.O);}}}class B{public enum I{L=0x00,P=0x02,Q=0x03,A=0x04,S=0x05,M=0x06,D=0x07,J=0x08,C=0x09,BL=0x0a,BE=0x0b,BG=0x0c,BN=0x0d}public enum R{A,B,C}enum C{N,L,E,G}public static Dictionary<R,uint>r=new Dictionary<R,uint>{{R.A,0},{R.B,0},{R.C,0}};public static Stack<uint>s=new Stack<uint>();static C c=C.N;public static string O{get{return string.Format("R0 {0}\nR1 {1}\nR2 {2}",r[R.A],r[R.B],r[R.C]);}}public static void E(byte[][]l){for(uint i=0;i<l.Length;i++){var q=l[i];switch((I)q[0]){case I.L:r[(R)q[1]]=U(q,2);break;case I.P:r[(R)q[1]]=s.Pop();break;case I.Q:s.Push(r[(R)q[1]]);r[(R)q[1]]=0;break;case I.A:s.Push(r[(R)q[1]]+r[(R)q[2]]);break;case I.S:s.Push(r[(R)q[1]]-r[(R)q[2]]);break;case I.M:s.Push(r[(R)q[1]]*r[(R)q[2]]);break;case I.D:s.Push(r[(R)q[1]]/r[(R)q[2]]);break;case I.J:i=U(q,1)-1;break;case I.C:{uint x=r[(R)q[1]],y=r[(R)q[2]];c=x<y?C.L:x>y?C.G:C.E;}break;case I.BL:if(c==C.L)i=U(q,1)-1;break;case I.BG:if(c==C.G)i=U(q,1)-1;break;case I.BE:if(c==C.E)i=U(q,1)-1;break;case I.BN:if(c!=C.E)i=U(q,1)-1;break;}}}public static byte[][]P(string[]c){return c.Where(l=>!l.StartsWith("#")).Select(r=>r.Split(' ').Where(b=>b.Length>0).Select(b=>Convert.ToByte(b,16)).ToArray()).Where(l=>l.Length>0).ToArray();}static uint U(byte[]b,int i){return(uint)(b[i]<<24|b[i+1]<<16|b[i+2]<<8|b[i+3]);}}

edit

removed some unnecessary public and private modifiers:

using System;using System.Collections.Generic;using System.IO;using System.Linq;class M{static void Main(string[]a){if(a.Length==1&&File.Exists(a[0])){B.E(B.P(File.ReadAllLines(a[0])));Console.Write(B.O);}}}class B{enum I{L=0x00,P=0x02,Q=0x03,A=0x04,S=0x05,M=0x06,D=0x07,J=0x08,C=0x09,BL=0x0a,BE=0x0b,BG=0x0c,BN=0x0d}enum R{A,B,C}enum C{N,L,E,G}static Dictionary<R,uint>r=new Dictionary<R,uint>{{R.A,0},{R.B,0},{R.C,0}};static Stack<uint>s=new Stack<uint>();static C c=C.N;public static string O{get{return string.Format("R0 {0}\nR1 {1}\nR2 {2}\n",r[R.A],r[R.B],r[R.C]);}}public static void E(byte[][]l){for(uint i=0;i<l.Length;i++){var q=l[i];switch((I)q[0]){case I.L:r[(R)q[1]]=U(q,2);break;case I.P:r[(R)q[1]]=s.Pop();break;case I.Q:s.Push(r[(R)q[1]]);r[(R)q[1]]=0;break;case I.A:s.Push(r[(R)q[1]]+r[(R)q[2]]);break;case I.S:s.Push(r[(R)q[1]]-r[(R)q[2]]);break;case I.M:s.Push(r[(R)q[1]]*r[(R)q[2]]);break;case I.D:s.Push(r[(R)q[1]]/r[(R)q[2]]);break;case I.J:i=U(q,1)-1;break;case I.C:{uint x=r[(R)q[1]],y=r[(R)q[2]];c=x<y?C.L:x>y?C.G:C.E;}break;case I.BL:if(c==C.L)i=U(q,1)-1;break;case I.BG:if(c==C.G)i=U(q,1)-1;break;case I.BE:if(c==C.E)i=U(q,1)-1;break;case I.BN:if(c!=C.E)i=U(q,1)-1;break;}}}public static byte[][]P(string[]c){return c.Where(l=>!l.StartsWith("#")).Select(r=>r.Split(' ').Where(b=>b.Length>0).Select(b=>Convert.ToByte(b,16)).ToArray()).Where(l=>l.Length>0).ToArray();}static uint U(byte[]b,int i){return(uint)(b[i]<<24|b[i+1]<<16|b[i+2]<<8|b[i+3]);}}

call it with executable.exe filename where filename is the file containing the code to be interpreted

My "test program":

# LOAD R0 5
# CMP R0 R1
# BRANCHEQ 13
# LOAD R1 1
# LOAD R2 1
# CMP R0 R2
# MUL R1 R2
# LOAD R1 1
# ADD R2 R1
# PUSH R2
# PUSH R1 
# BRANCHEQ 13
# JMP 5
# POP R2
# POP R0
# POP R1
# PUSH R0

0x0 0x0 0x0 0x0 0x0 0x5
0x9 0x0 0x1 
0xb 0x0 0x0 0x0 0xd 
0x0 0x1 0x0 0x0 0x0 0x1 
0x0 0x2 0x0 0x0 0x0 0x1 
0x9 0x0 0x2 
0x6 0x1 0x2 
0x0 0x1 0x0 0x0 0x0 0x1 
0x4 0x2 0x1 
0x2 0x2 
0x2 0x1 
0xb 0x0 0x0 0x0 0xd 
0x8 0x0 0x0 0x0 0x5 
0x3 0x2 
0x3 0x0 
0x3 0x1 
0x2 0x0

Interpreter ungolfed with longer names variables, classes, ...

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        if (args.Length == 1 && File.Exists(args[0]))
        {
            var code = ByteCodeInterpreter.ParseCode(File.ReadAllLines(args[0]));
            ByteCodeInterpreter.Execute(code);
            Console.WriteLine(ByteCodeInterpreter.Output);
        }
    }
}

public static class ByteCodeInterpreter
{
    public enum Instruction : byte
    {
        LOAD = 0x00,
        PUSH = 0x02,
        POP = 0x03,
        ADD = 0x04,
        SUB = 0x05,
        MUL = 0x06,
        DIV = 0x07,
        JMP = 0x08,
        CMP = 0x09,
        BRANCHLT = 0x0a,
        BRANCHEQ = 0x0b,
        BRANCHGT = 0x0c,
        BRANCHNE = 0x0d
    }

    public enum Register : byte
    {
        R0 = 0x00,
        R1 = 0x01,
        R2 = 0x02
    }

    private enum CompareFlag : byte
    {
        NONE = 0x00,
        LT = 0x01,
        EQ = 0x02,
        GT = 0x03,
    }

    public static readonly Dictionary<Register, uint> register = new Dictionary<Register, uint>
    {
        {Register.R0, 0},
        {Register.R1, 0},
        {Register.R2, 0}
    };

    public static readonly Stack<uint> stack = new Stack<uint>();
    private static CompareFlag compareFlag = CompareFlag.NONE;

    public static string Output
    {
        get
        {
            return string.Format("R0 {0}\nR1 {1}\nR2 {2}", register[Register.R0], register[Register.R1],
                register[Register.R2]);
        }
    }

    public static void Execute(byte[][] lines)
    {
        for (uint i = 0; i < lines.Length; i++)
        {
            var line = lines[i];
            switch ((Instruction)line[0])
            {
                case Instruction.LOAD:
                    register[(Register)line[1]] = GetUint(line, 2);
                    break;
                case Instruction.PUSH:
                    register[(Register)line[1]] = stack.Pop();
                    break;
                case Instruction.POP:
                    stack.Push(register[(Register)line[1]]);
                    register[(Register)line[1]] = 0;
                    break;
                case Instruction.ADD:
                    stack.Push(register[(Register)line[1]] + register[(Register)line[2]]);
                    break;
                case Instruction.SUB:
                    stack.Push(register[(Register)line[1]] - register[(Register)line[2]]);
                    break;
                case Instruction.MUL:
                    stack.Push(register[(Register)line[1]] * register[(Register)line[2]]);
                    break;
                case Instruction.DIV:
                    stack.Push(register[(Register)line[1]] / register[(Register)line[2]]);
                    break;
                case Instruction.JMP:
                    i = GetUint(line, 1) - 1;
                    break;
                case Instruction.CMP:
                    {
                        uint v0 = register[(Register)line[1]], v1 = register[(Register)line[2]];
                        if (v0 < v1)
                            compareFlag = CompareFlag.LT;
                        else if (v0 > v1)
                            compareFlag = CompareFlag.GT;
                        else
                            compareFlag = CompareFlag.EQ;
                    }
                    break;
                case Instruction.BRANCHLT:
                    if (compareFlag == CompareFlag.LT)
                        i = GetUint(line, 1) - 1;
                    break;
                case Instruction.BRANCHGT:
                    if (compareFlag == CompareFlag.GT)
                        i = GetUint(line, 1) - 1;
                    break;
                case Instruction.BRANCHEQ:
                    if (compareFlag == CompareFlag.EQ)
                        i = GetUint(line, 1) - 1;
                    break;
                case Instruction.BRANCHNE:
                    if (compareFlag != CompareFlag.EQ)
                        i = GetUint(line, 1) - 1;
                    break;
            }
        }
    }

    public static byte[][] ParseCode(string[] code)
    {
        return
            code.Where(line => !line.StartsWith("#"))
                .Select(line => line.Split(' ').Where(b => b.Length > 0).Select(b => Convert.ToByte(b, 16)).ToArray())
                .Where(line => line.Length > 0)
                .ToArray();
    }

    private static uint GetUint(byte[] bytes, int index)
    {
        return (uint)(bytes[index] << 24 | bytes[index + 1] << 16 | bytes[index + 2] << 8 | bytes[index + 3]);
    }
}

Stefan

Posted 2014-11-23T04:18:02.753

Reputation: 261

Smallest Bytecode Interpreter/VM

Answers

C, 752 (589+163 for define flags) * 0.5 (JIT) * 0.9 (extensions) * (0.75 optimization) * (0.64 seconds/2) = 81.216

C, Score = 854byte × (~0.8sec / 2) × 0.5 [JIT] × 0.9 [Extensions] = ~154byte sec

CJam, 222 187 185 bytes * (too slow / 2)

python/c++, score=56.66

Lua 5.2 (or LuaJIT), 740 bytes

C#

edit

C, Score = 854_byte × (~0.8_sec / 2) × 0.5 [JIT] × 0.9 [Extensions] = ~154_{byte sec}