8bit virtual machine

31

10

Background

I like my old 8-bit 6502 chip. It's even fun to solve some of the challenges here on PPCG in 6502 machine code. But some things that should be simple (like, read in data or output to stdout) are unnecessarily cumbersome to do in machine code. So there's a rough idea in my mind: Invent my own 8-bit virtual machine that's inspired by the 6502, but with the design modified to be more usable for challenges. Starting to implement something, I realized this might be a nice challenge itself if the design of the VM is reduced to a bare minimum :)

Task

Implement an 8-bit virtual machine conforming to the following specification. This is , so the implementation with the fewest bytes wins.

Input

Your implementation should take the following inputs:

  • A single unsigned byte pc, this is the initial program counter (the address in memory where the VM starts execution, 0-based)

  • A list of bytes with a maximum length of 256 entries, this is the RAM for the virtual machine (with its initial contents)

You may take this input in any sensible format.

Output

A list of bytes which are the final contents of the RAM after the VM terminates (see below). You can assume you only get input that leads to terminating eventually. Any sensible format is allowed.

Virtual CPU

The virtual CPU has

  • an 8-bit program counter,
  • an 8-bit accumulator register called A and
  • an 8-bit index register called X.

There are three status flags:

  • Z - the zero flag is set after some operation results in 0
  • N - the negative flag is set after some operation results in a negative number (iow bit 7 of the result is set)
  • C - the carry flag is set by additions and shifts for the "missing" bit of the result

Upon start, the flags are all cleared, the program counter is set to a given value and the contents of A and X are indeterminate.

The 8-bit values represent either

  • an unsigned integer in the range [0..255]
  • a signed integer, 2's complement, in the range [-128..127]

depending on context. If an operation over- or underflows, the value wraps around (and in case of an addition, the carry flag is affected).

Termination

The virtual machine terminates when

  • A HLT instruction is reached
  • A non-existing memory address is accessed
  • The program counter runs outside the memory (note it doesn't wrap around even if the VM is given the full 256 bytes of memory)

Adressing modes

  • implicit -- the instruction has no argument, the operand is implied
  • immediate -- the operand is the byte directly after the instruction
  • relative -- (for branching only) the byte after the instruction is signed (2's complement) and determines the offset to add to the program counter if the branch is taken. 0 is the location of the following instruction
  • absolute -- the byte after the instruction is the address of the operand
  • indexed -- the byte after the instruction plus X (the register) is the address of the operand

Instructions

Each instruction consists of an opcode (one byte) and, in the addressing modes immediate, relative, absolute and indexed a second argument byte. When the virtual CPU executes an instruction, it increments the program counter accordingly (by 1 or 2).

All opcodes shown here are in hex.

  • LDA -- load operand into A

    • Opcodes: immediate: 00, absolute: 02, indexed: 04
    • Flags: Z, N
  • STA -- store A into operand

    • Opcodes: immediate: 08, absolute: 0a, indexed: 0c
  • LDX -- load operand into X

    • Opcodes: immediate: 10, absolute: 12, indexed: 14
    • Flags: Z, N
  • STX -- store X into operand

    • Opcodes: immediate: 18, absolute: 1a, indexed: 1c
  • AND -- bitwise and of A and operand into A

    • Opcodes: immediate: 30, absolute: 32, indexed: 34
    • Flags: Z, N
  • ORA -- bitwise or of A and operand into A

    • Opcodes: immediate: 38, absolute: 3a, indexed: 3c
    • Flags: Z, N
  • EOR -- bitwise xor (exclusive or) of A and operand into A

    • Opcodes: immediate: 40, absolute: 42, indexed: 44
    • Flags: Z, N
  • LSR -- logical shift right, shift all bits of operand one place to the right, bit 0 goes to carry

    • Opcodes: immediate: 48, absolute: 4a, indexed: 4c
    • Flags: Z, N, C
  • ASL -- arithmetic shift left, shift all bits of the operand one place to the left, bit 7 goes to carry

    • Opcodes: immediate: 50, absolute: 52, indexed: 54
    • Flags: Z, N, C
  • ROR -- rotate right, shift all bits of operand one place to the right, carry goes to bit 7, bit 0 goes to carry

    • Opcodes: immediate: 58, absolute: 5a, indexed: 5c
    • Flags: Z, N, C
  • ROL -- rotate left, shift all bits of the operand one place to the left, carry goes to bit 0, bit 7 goes to carry

    • Opcodes: immediate: 60, absolute: 62, indexed: 64
    • Flags: Z, N, C
  • ADC -- add with carry, operand plus carry is added to A, carry is set on overflow

    • Opcodes: immediate: 68, absolute: 6a, indexed: 6c
    • Flags: Z, N, C
  • INC -- increment operand by one

    • Opcodes: immediate: 78, absolute: 7a, indexed: 7c
    • Flags: Z, N
  • DEC -- decrement operand by one

    • Opcodes: immediate: 80, absolute: 82, indexed: 84
    • Flags: Z, N
  • CMP -- compare A with operand by subtracting operand from A, forget result. Carry is cleared on underflow, set otherwise

    • Opcodes: immediate: 88, absolute: 8a, indexed: 8c
    • Flags: Z, N, C
  • CPX -- compare X -- same as CMP for X

    • Opcodes: immediate: 90, absolute: 92, indexed: 94
    • Flags: Z, N, C
  • HLT -- terminate

    • Opcodes: implicit: c0
  • INX -- increment X by one

    • Opcodes: implicit: c8
    • Flags: Z, N
  • DEX -- decrement X by one

    • Opcodes: implicit: c9
    • Flags: Z, N
  • SEC -- set carry flag

    • Opcodes: implicit: d0
    • Flags: C
  • CLC -- clear carry flag

    • Opcodes: implicit: d1
    • Flags: C
  • BRA -- branch always

    • Opcodes: relative: f2
  • BNE -- branch if Z flag cleared

    • Opcodes: relative: f4
  • BEQ -- branch if Z flag set

    • Opcodes: relative: f6
  • BPL -- branch if N flag cleared

    • Opcodes: relative: f8
  • BMI -- branch if N flag set

    • Opcodes: relative: fa
  • BCC -- branch if C flag cleared

    • Opcodes: relative: fc
  • BCS -- branch if C flag set

    • Opcodes: relative: fe

Opcodes

The behavior of the VM is undefined if any opcode is found that doesn't map to a valid instruction from the above list.

As per Jonathan Allan's request, you may choose your own set of opcodes instead of the opcodes shown in the Instructions section. If you do so, you must add a full mapping to the opcodes used above in your answer.

The mapping should be a hex file with pairs <official opcode> <your opcode>, e.g. if you replaced two opcodes:

f4 f5
10 11

Newlines don't matter here.

Test cases (official opcodes)

// some increments and decrements
pc:     0
ram:    10 10 7a 01 c9 f4 fb
output: 10 20 7a 01 c9 f4 fb

// a 16bit addition
pc:     4
ram:    e0 08 2a 02 02 00 6a 02 0a 00 02 01 6a 03 0a 01
output: 0a 0b 2a 02 02 00 6a 02 0a 00 02 01 6a 03 0a 01

// a 16bit multiplication
pc:     4
ram:    5e 01 28 00 10 10 4a 01 5a 00 fc 0d 02 02 d1 6a 21 0a 21 02 03 6a 22 0a 22 52
        02 62 03 c9 f8 e6 c0 00 00
output: 00 00 00 00 10 10 4a 01 5a 00 fc 0d 02 02 d1 6a 21 0a 21 02 03 6a 22 0a 22 52
        02 62 03 c9 f8 e6 c0 b0 36

I might add more testcases later.

Reference and testing

To help with own experiments, here's some (totally not golfed) reference implementation -- it can output tracing information (including disassembled instructions) to stderr and convert opcodes while running.

Recommended way to get the source:

git clone https://github.com/zirias/gvm --branch challenge --single-branch --recurse-submodules

Or checkout branch challenge and do a git submodule update --init --recursive after cloning, to get my custom build system.

Build the tool with GNU make (just type make, or gmake if on your system, the default make isn't GNU make).

Usage: gvm [-s startpc] [-h] [-t] [-c convfile] [-d] [-x] <initial_ram

  • -s startpc -- the initial program counter, defaults to 0
  • -h -- input is in hex (otherwise binary)
  • -t -- trace execution to stderr
  • -c convfile -- convert opcodes according to a mapping given in convfile
  • -d -- dump resulting memory as binary data
  • -x -- dump resulting memory as hex
  • initial_ram -- the initial RAM contents, either hex or binary

Note the conversion feature will fail on programs that modify opcodes while running.

Disclaimer: The rules and specs above are authorative for the challenge, not this tool. This especially applies to the opcode conversion feature. If you think the tool presented here has a bug wrt the specs, please report in a comment :)

Felix Palmen

Posted 2018-08-23T12:48:15.007

Reputation: 3 866

1I'd imagine that there would probably be numerous golfing opportunities to be had by choosing different opcodes for the instructions but it seems the opcodes have been fixed (even though the instruction set is what defines the machine). Maybe it's worth considering allowing implementations to have their own code-page? – Jonathan Allan – 2018-08-23T13:42:35.167

1@JonathanAllan thought twice about it, I'm allowing it now and might add a "conversion" tool to make solutions using other sets of opcodes easily testable. – Felix Palmen – 2018-08-23T14:33:28.613

Related. – Peter Taylor – 2018-08-23T15:37:33.023

@PeterTaylor only very loosely -- I've seen this one ;) my instruction set is inspired by the 6502, but the architecture here is greatly simplified, so golfed solutions are (hopefully) realistic ;) – Felix Palmen – 2018-08-23T15:42:55.287

Adding the link in comments causes both questions to show up in the other's "Linked" sidebar, so in particular people who find the other question by some means will be slightly more likely to then find this one and answer it. – Peter Taylor – 2018-08-23T15:50:37.877

@PeterTaylor I didn't oppose mentioning it ;) just wanted to emphasize the relation is "weak" ... it's there because the instruction set here is inspired by the famous 6502 :) – Felix Palmen – 2018-08-23T19:24:52.353

What's the expected behavior of STA, STX, DEC, etc. with immediate addressing? Self-modifying code? – Arnauld – 2018-08-24T08:21:55.280

@Arnauld yes, straight forward -- they modify their operand, so if the operand is immediate, that byte is modified :) (this would allow for very strange code, but with the liberty to choose your own opcodes, I can't use it in testcases ....) – Felix Palmen – 2018-08-24T08:25:36.257

1@Arnauld btw my reasoning for allowing this was to reduce the amount of special cases, so it should be better "golfable" -- each opcode is either implicit, a relative branch or allows all three other addressing modes :) – Felix Palmen – 2018-08-24T08:41:45.070

1if BRA ("branch always") doesn't introduce a branch in the control flow, shouldn't it be called JMP? – ngn – 2018-08-25T10:57:56.543

1@ngn well, BRA exists in later chip designs (the 6502 doesn't have such an instruction) like the 65C02 and the MC 68000. JMP exists as well. The difference is that BRA uses relative addressing and JMP uses absolute addressing. So, I just followed these designs -- indeed, it doesn't sound all that logical ;) – Felix Palmen – 2018-08-25T11:00:33.153

Since A and X are "indeterminate" at the start, does this mean we can assume the PC will not reach a command which reads either of them before it reaches one that has set them properly? – Erik the Outgolfer – 2018-08-27T09:44:17.403

Also, does the length of the inputted RAM represent the size of the whole RAM, or should we somehow pad it to 256 bytes? – Erik the Outgolfer – 2018-08-27T09:45:59.893

@EriktheOutgolfer a) not for a program with a defined result (iow not for any testcase I might add), yes ;) b) the input is the RAM, so yes, with an input of size 64, 64 would be a non-existent address. – Felix Palmen – 2018-08-27T09:47:21.030

@FelixPalmen Alright, so I won't have to handle such edge-cases or pad the RAM, just making sure I've understood the challenge correctly, thanks. – Erik the Outgolfer – 2018-08-27T09:49:13.680

a) How can ADC, INC, INX affect the N flag? Is that for simplicity of the VM? b) Do the increments and decrements wrap or what else will happen on overflow/underflow? c) Why is there no SBC? – Titus – 2018-09-05T01:15:28.440

1@Titus a) what's special about those? as described, N is set whenever a result is "negative" (has bit 7 set). b) yes they do, there's not much else that would make sense, but maybe should be made explicit in the specs c) simplicity for the golfing challenge. subtraction can be done by eor #$ff, adc anyways (with inverted meaning of the carry flag). – Felix Palmen – 2018-09-07T06:14:08.017

1@FelixPalmen a) I think I´ve been working with 32 and 64 bit integers for too long. Memories of the 6510 and 8bit values are a bit blurred. c) how about a IVC (invert carry) instruction then? (j/k) – Titus – 2018-09-07T10:07:49.147

@Titus the funny thing is sbc on the 6502 indeed works with an inverted carry flag. I guess this is exactly because with 2's complement, it allows to reuse the adc logic by inverting the operand :) – Felix Palmen – 2018-09-15T11:46:42.597

Also, can the C flag be set when the operand over/underflows after an INC, INX, DEC or DEX? – None – 2018-09-21T12:04:56.033

@Rogem no, these don't affect C. – Felix Palmen – 2018-09-22T12:29:05.383

What does LSR IMM 8 mean? – l4m2 – 2018-10-20T08:49:11.877

@l4m2: If I understand you correctly, see Arnauld's comment and my following comments

– Felix Palmen – 2018-10-22T08:15:05.120

Answers

16

C (gcc), 1381 1338 1255 1073 bytes

Huge improvement thanks to ceilingcat and Rogem.

#include<stdio.h>
C*F="%02hhx ";m[256],p,a,x,z,n,c,e;g;*I(){R++p+m;}*A(){R*I()+m;}*X(){R*I()+m+x;}C*Q(){W(printf,m[g],1)exit(a);}C*(*L[])()={I,Q,A,Q,X,Q,Q,Q};l(C*i){R 254/p?*i=*L[m[p]&7]():*Q();}s(i){R 254/p?*L[m[p]&7]()=i:*Q();}q(){p>254?Q():++p;}C*y(){p+=e;}B(U,z)B(V,!z)B(_,n)B(BM,!n)B(BC,c)B(BS,!c)C*(*b[])()={Q,Q,y,Q,U,Q,V,Q,_,Q,BM,Q,BC,Q,BS,Q};j(){z=!l(&a);v}o(){s(a);}t(){z=!l(&x);n=x&H;}D(){s(x);}f(K(&=)h(K(|=)i(K(^=)J(E;c=e&1;z=!(e/=2);s(e);w}k(E;c=w;z=!e;s(e*=2);}T(E;g=e&1;z=!(e=e/2|H*!!c);c=g;s(e);w}M(E;g=w z=!(e=e*2|H*!!c);c=g;s(e);}N(E;z=!(a=g=a+e+!!c);c=g>>8%2;G}P(E;z=!~e;--p;s(g=e+1);G}u(E;g=e-1;z=!g;--p;s(g);G}r(E;g=a-e;z=!g;c=G}S(E;g=x-e;z=!g;c=G}Y(){z=!(x=g=1-m[p]%2*2);n=x&H;}Z(){c=~m[p]&1;}d(){p<255||Q();e=m[++p];b[m[p-1]&15]();}(*O[])()={j,o,t,D,Q,Q,f,h,i,J,k,T,M,N,Q,P,u,r,S,Q,Q,Q,Q,Q,Q,Y,Z,Q,Q,Q,d,d};main(){scanf(F,&p);W(scanf,&m[g],0)for(;;q())O[m[p]/8]();}

Try it online!

A lot of defines moved to compiler flags.

Explanation (VERY ungolfed):

#include<stdio.h>

// useful defines
#define C unsigned char
#define H 128 // highest bit
#define R return

// code generator for I/O
#define W(o,p,q)for(g=-1;++g<256&&((q)||!feof(stdin));)(o)(F,(p));

// code generator for branching instruction handlers
#define BB(q)(){(q)||Y();}

// frequent pieces of code
#define NA n=a&H;
#define NE n=e&H;
#define NG n=g&H;
#define E l(&e)

// printf/scanf template
C * F = "%02hhx ";

// global state: m=memory, pax=registers, znc=flags
// e and g are for temporaries and type conversions
C m[256],p,a,x,z,n,c,e;g;

// get the pointer to a memory location:
C * I() {R &m[++p];} // immediate
C * A() {R &m[m[++p]];} // absolute
C * X() {R &m[m[++p]+x];} // indexed

// terminate the VM (and dump memory contents)
C * Q() { W(printf,m[g],1) exit(a);}

// an array of functions for accessing the memory
// They either return the pointer to memory location
// or terminate the program.
C * (*L[])()={I,Q,A,Q,X,Q,Q,Q};

// load a byte from the memory into the variable pointed by i
// terminate the program if we cannot access the address byte
l (C * i) {return 254 / p ? *i = *L[m[p]&7] () : *Q ();}

// save a byte i to the memory
// terminate the program if we cannot access the address byte
s (C i) {return 254 / p ? *L[m[p]&7]() = i : *Q ();}

// advance the instruction pointer (or fail if we fall outside the memory)
q () {p > 254 ? Q () : ++p;}

// branch
C * Y() {p += e;}

// generated functions for conditional branches
C * BN BB(z)
C * BZ BB(!z)
C * BP BB(n)
C * BM BB(!n)
C * BC BB(c)
C * BS BB(!c)

// a list of branch functions
C * (*B[])() = {Q,Q,Y,Q,BN,Q,BZ,Q,BP,Q,BM,Q,BC,Q,BS,Q};

// Instruction handling functions

OA () {z = !l (&a); NA} // lda
OB () {s (a);} // sta
OC () {z = !l (&x); n = x & H;} // ldx
OD () {s (x);} // stx
OG () {E; z = !(a &= e); NA} // and
OH () {E; z = !(a |= e); NA} // ora
OI () {E; z = !(a ^= e); NA} // eor
OJ () {E; c = e & 1; z = !(e /= 2); s (e); NE} // lsr
OK () {E; c = NE; z = !e; s (e *= 2);} // asl
OL () {E; g = e & 1; z = !(e = e / 2 | H * !!c); c = g; s (e); NE} // ror
OM () {E; g = e & H; z = !(e = e * 2 | H * !!c); c = g; s (e); NE} // rol
ON () {E; z = !(a = g = a + e + !!c); c = !!(g & 256); NG} // adc
OP () {E; z = !~e; --p; s (g = e + 1); NG} // inc
OQ () {E; g = e - 1; z = !g; --p; s (g); NG} // dec
OR () {E; g = a - e; z = !g; c = NG} // cmp
OS () {E; g = x - e; z = !g; c = NG} // cpx
OY () {z = !(x = g = ~m[p] & 1 * 2 - 1); n = x & H;} // inx/dex
OZ () {c = ~m[p] & 1;} // sec/clc
Od () {p < 255 || Q (); e = m[++p]; B[m[p-1]&15] ();} // branching

// list of opcode handlers
(*O[]) () = {OA,OB,OC,OD,Q,Q,OG,OH,OI,OJ,OK,OL,OM,ON,Q,OP,OQ,OR,OS,Q,Q,Q,Q,Q,Q,OY,OZ,Q,Q,Q,Od,Od};

// main function
main ()
{
    // read the instruction pointer
    scanf (F, &p);

    // read memory contents
    W(scanf, &m[g], 0)

    // repeatedly invoke instruction handlers until we eventually terminate
    for (;; q())
        O[m[p]/8] ();
}

Max Yekhlakov

Posted 2018-08-23T12:48:15.007

Reputation: 601

Great work, instant +1, really didn't expect a C solution first :) your appending of 00 bytes might be a bit bending the rules though ... I admit I didn't try to analyze this code ... could you maybe save bytes doing I/O in binary instead of hex? Would be allowed by the rules :) – Felix Palmen – 2018-08-24T14:52:06.123

I'd like to replace the rule that illegal opcodes lead to termination by just saying the behavior of illegal opcodes is undefined ... would this hurt your answer or are you fine with that? – Felix Palmen – 2018-08-25T16:00:37.580

@FelixPalmen well, as long as termination is quite valid "undefined" behavior, it won't hurt (it opens a new possibility for golfing it down instead!) – Max Yekhlakov – 2018-08-27T05:00:39.107

@MaxYekhlakov by "hurt" I meant being unfair towards your solution because you maybe "spent bytes" for making sure an illegal opcode terminates the vm. I'm glad you welcome the rule change as an opportunity :) And again, congrats, I just love to see a solution in C, which is my all-time favorite programming language. It's a pity you'll rarely win a code-golf challenge in C -- nevertheless, "golfed" C is just cool :) – Felix Palmen – 2018-08-27T23:12:40.323

Could you add the flags part to post? – l4m2 – 2018-10-20T08:43:21.147

http://zto.me/c0a8zs some optimization – l4m2 – 2018-10-20T09:13:25.943

8

APL (Dyalog Classic), 397 332 330 bytes

thanks @Adám for -8 bytes

f←{q←2⍴a x c←≡B←256⋄{0::m
⍺←(∇q∘←)0∘=,B≤+⍨
30≤u←⌊8÷⍨b←p⊃m:∇p+←129-B|127-1⊃p↓m×⊃b⌽2/(,~,⍪)1,q,c
p+←1
u=25:⍺x⊢←B|x+¯1*b
u=26:∇c⊢←~2|b
p+←≢o←m⊃⍨i←⍎'p⊃m+x'↑⍨1+8|b
⍎u⊃(,⍉'⍺ax⊢←o' '∇m[i]←ax'∘.~'xa'),5 4 2 3 2/'⍺⌽⊃'∘,¨'a⊢←2⊥(⍕⊃u⌽''∧∨≠'')/o a⊤⍨8⍴2' 'c(i⊃m)←u⌽d⊤(⌽d←u⌽2B)⊥u⌽o,c×u>10' 'c a⊢←2B⊤a+o+c' 'm[i]←B|o-¯1*u' 'c⊢←⊃2B⊤o-⍨⊃u⌽x a'}p m←⍵}

Try it online!

p  program counter
m  memory
a  accumulator register
x  index register
q  flags z (zero) and n (negative) as a length-2 vector
c  flag for carry
⍺  function to update z and n
b  current instruction
u  highest 5 bits of b
o  operand
i  target address in memory

ngn

Posted 2018-08-23T12:48:15.007

Reputation: 11 449

Let us continue this discussion in chat.

– ngn – 2018-08-26T10:45:05.763

Does this solution have unintended opcodes and if not, did you spend bytes avoiding them? See this comment for the reason I'm asking ...

– Felix Palmen – 2018-08-26T12:25:43.297

@FelixPalmen Now that you mention it, yes :( Initially I observed that rule, but as I was golfing I accidentally made 4, 5, and possibly others, valid opcodes. So a decision to make their behaviour undefined would be most welcome :) – ngn – 2018-08-26T12:39:08.163

2done now, I realize it wasn't the best decision in the first place and @MaxYekhlakov unfortunately didn't have to say anything about the rule change. – Felix Palmen – 2018-08-26T12:44:27.380

Do you need f←? – Erik the Outgolfer – 2018-08-26T20:35:09.773

@EriktheOutgolfer sometimes I forget to remove it – ngn – 2018-08-26T21:52:07.300

-2 by renaming s to so s⊢ becomes . – Adám – 2019-01-21T22:39:16.537

@Adám i like that :) thanks – ngn – 2019-01-21T23:17:37.313

8

C (gcc), 487, 480, 463, 452, 447, 438 bytes

Uses this instruction mapping. The update to the instructions shaved off 9 bytes, and potentially more in the future. Returns by modifying the memory pointed to by the first argument (M). Thanks to @ceilingcat for shaving off some bytes.

Must be compiled with flags -DO=*o -DD=*d -DI(e,a,b)=if(e){a;}else{b;} -Du="unsigned char" (already included in bytes).

e(M,Z,C)u*M,C;{for(u r[2],S=0,D,O,c,t;o=C<Z?M+C++:0;){I(c=O,d=r+!(c&4),break)I(o=c&3?C<Z&&C?M+C++:0:d,o=c&2?O+c%2**r+M:o,break)t=(c/=8)&7;I(c<24&c>4&&t,t&=3;I(c&8,I(c&4,c&=S&1;S=O>>7*!(t/=2);O=t=O<<!t>>t|c<<7*t,t=O+=t%2*2-1),I(c&4,D=t=t?t&2?t&1?O^D:O|D:O&D:O,I(c&1,S=D>(t=D+=O+S%2),t=D-O;S=t>D)))S=S&1|t>>6&2|4*!t,I(c&8,C+=!(t&~-t?~t&S:t&~S)*O,I(t,S=S&6|c%2,O=D)))I(C,,Z=0)}}

Try it online!

Preprocessor

-DO=*o -DD=*d

These two provide a shorter way to dereference the pointers.

-DI(e,a,b)=if(e){a;}else{b;} -Du="unsigned char"

Reduce the number of bytes needed for if-elses and type declarations.

Code

The below is a human-readable version of the code, with the preprocessor directives expanded, and variables renamed for readability.

exec_8bit(unsigned char *ram, int ramSize, unsigned char PC)
{  
  for(unsigned char reg[2], SR=0, // The registers. 
                                  // reg[0] is X, reg[1] is A. 
                                  // SR contains the flags.
      *dst, *op, opCode, tmp;
      // Load the next instruction as long as we haven't gone out of ram.
      op = PC < ramSize ? ram + PC++ : 0;)
  { // Check for HLT.
    if(opCode=*op)
    { // Take a pointer to the register selected by complement of bit 3.
      dst = reg+!(opCode&4);
    } else break;
    // Load operand as indicated by bits 0 and 1. Also check that we don't
    // go out of bounds and that the PC doesn't overflow.
    if(op = opCode&3 ? PC<ramSize && PC ? ram + PC++ : 0 : dst)
    {
      op = opCode&2 ? *op + opCode%2 * *reg + ram: op
    } else break;

    // Store the bits 3-5 in tmp.
    tmp = (opCode/=8) & 7;
    if(opCode<24 & opCode>4 && tmp)
    { // If not HLT, CLC, SEC or ST, enter this block.
      tmp &= 3; // We only care about bits 3&4 here.
      if(opCode&8) // Determine whether the operation is binary or unary.
      { // Unary
        if(opCode&4)
        { // Bitshift
          opCode &= SR&1; // Extract carry flag and AND it with bit 3 in opCode.
          SR=*op >> 7*!(tmp/=2);// Update carry flag.
          // Shift to left if bit 4 unset, to right if set. Inclusive-OR 
          // the carry/bit 3 onto the correct end.
          *op = tmp = *op << !tmp >> tmp | opCode << 7*tmp;
        } else tmp=*o+=tmp%2*2-1;
      } else if(opCode&4) {
        // Bitwise operations and LD.
        // Ternary conditional to determine which operation.
        *dst = tmp = tmp? tmp&2? tmp&1? *op^*dst: *op|*dst: *op&*dst: *op
      } else if(opCode&1) {
        // ADC. Abuse precedence to do this in fewer bytes.
        // Updates carry flag.
        SR = *dst > (tmp = *dst += *op + SR%2);
      } else tmp=*dst-*op; SR = tmp > *dst; // Comparison.
      SR = SR&1 | tmp >> 6&2 | 4*!tmp; // Update Z and N flags, leaving C as it is.
    } else if(opCode&8) {
      // Branch.
      // tmp&~-tmp returns a truthy value when tmp has more than one bit set
      // We use this to detect the "unset" and "always" conditions.
      // Then, we bitwise-AND either ~tmp&SR or tmp&~SR to get a falsy value
      // when the condition is fulfilled. Finally, we take logical complement,
      // and multiply the resulting value (`1` or `0`) with the operand,
      // and add the result to program counter to perform the jump.
      PC += !(tmp & ~-tmp? ~tmp&SR : tmp&~SR) * *op;
    } else if (tmp) { // SEC, CLC
      SR = SR&6 | opCode % 2;
    } else {
      *op = *dst; // ST
    }
    if(!PC){ // If program counter looped around, null out ramSize to stop.
           // There's likely a bug here that will kill the program when it
           // branches back to address 0x00
      ramSize=0;
    }
  }
}

Instructions

The instructions are structured as follows:

  • Bits 6-7 indicate the arity of the instruction (00 Nullary, 01 Unary, 10 Binary, 11 Binary)

  • Bits 0-2 determine the operand(s): R=0 selects A and R=1 selects X. OP=00 uses the register as an operand, OP=01 selects immediate operand, OP=10 selects absolute operand and OP=11 selects indexed operand.

    • As you may have noticed, this allows for any operations to be performed on either register (although you can still only index from X) even when they normally couldn't be used per specification. E.g. INC A, ADC X, 10 and ASL X all work.
  • Bits 3-5 determine the condition for branching: having one of the bits indicates which flag to test (bit 3->C, bit 4->N, bit 5->Z). If only one bit is set, the instruction tests for a set flag. To test for an unset flag, take the complement of the bits. For example 110 tests for unset carry and 001 for set carry. 111 and 000 branch always.

  • You can also branch to an address offset stored in a register, allowing you to write functions, or you can use the standard indexing modes. OP=01 behaves like specification branch.

+-----+----------+-------+-----------------------------------------------+
| OP  | BINARY   | FLAGS | INFO                                          |
+-----+----------+-------+-----------------------------------------------+
| ST  | 10000ROP |       | Register -> Operand                           |
| LD  | 10100ROP | Z N   | Operand -> Register                           |
| AND | 10101ROP | Z N   | Register &= Operand                           |
| XOR | 10111ROP | Z N   | Register ^= Operand                           |
| IOR | 10110ROP | Z N   | Register |= Operand                           |
| ADC | 10011ROP | Z N C | Register += Operand + Carry                   |
| INC | 01011ROP | Z N   | Operand += 1                                  |
| DEC | 01010ROP | Z N   | Operand -= 1                                  |
| ASL | 01100ROP | Z N C | Operand <<= 1                                 |
| LSR | 01110ROP | Z N C | Operand >>= 1                                 |
| ROL | 01101ROP | Z N C | Operand = Operand << 1 | Carry                |
| ROR | 01111ROP | Z N C | Operand = Operand >> 1 | Carry << 7           |
| CMP | 10010ROP | Z N C | Update ZNC based on Register - Operand        |
| BR  | 11CNDROP |       | PC += Condition ? Operand : 0      |
| SEC | 00011000 |     C | Set carry                                     |
| CLC | 00010000 |     C | Clear carry                                   |
| HLT | 00000000 |       | Halt execution.                               |
+-----+----------+-------+-----------------------------------------------+

user77406

Posted 2018-08-23T12:48:15.007

Reputation:

7

JavaScript (ES6), 361 bytes

Takes input as (memory)(program_counter), where memory is an Uint8Array. Outputs by modifying this array.

M=>p=>{for(_='M[\u0011\u0011A]\u0010\u0010=\u000fc=\u000e,\u0011p]\f(n=\u000b128)\t=\u0010\b&=255\u0007,z=!(n\u0007),n&=\t;\u0006\u0006\u000b\u0005-\u0010,\u000en>=0\u0005\u0004\u0011c\b>>7,A]*2\u0005\u0003\u0011c\b&1,A]/2\u0005\u000f\u0002&&(p+=(\u0010^\t-\t;\u0001for(a=n=z=\u000ex=0;a\u0007,x\u0007,A=[i=\u0011p++],p\f\f+x][i&3],i&3&&p++,i&&A<256;)eval(`\u000ba\b\u0006\u000fa;\u000bx\b\u0006\u000fx;\u000ba&\b\u0005a|\b\u0005a^\b\u0005\u000f\u0002\u0003\u000fc*128|\u0002c|\u0003a+\b+c,\u000ea>>8\u0005++\u0010\u0005--\u0010\u0005a\u0004x\u0004++x\u0005--x\u0006\u000e1;\u000e0;1\u0001!z\u0001z\u0001!n\u0001n\u0001!c\u0001c\u0001`.split`;`[i>>2])';G=/[\u0001-\u0011]/.exec(_);)with(_.split(G))_=join(shift());eval(_)}

NB: The code is compressed with RegPack and contains a lot of unprintable characters, which are all escaped in the above representation of the source.

Try it online!

Opcode mapping and test cases

The virtual machine uses this opcode mapping.

Below are the translated test cases, along with the expected outputs.

Test case #1

00 - LDX #$10  09 10
02 - INC $01   32 01
04 - DEX       44
05 - BNE $fb   55 fb

Expected output:

$$\text{09 }\color{red}{\text{20}}\text{ 32 01 44 55 fb}$$

Test case #2

00 - (DATA)    e0 08 2a 02
04 - LDA $00   02 00
06 - ADC $02   2e 02
08 - STA $00   06 00
0a - LDA $01   02 01
0c - ADC $03   2e 03
0e - STA $01   06 01

Expected output:

$$\color{red}{\text{0a 0b}}\text{ 2a 02 02 00 2e 02 06 00 02 01 2e 03 06 01}$$

Test case #3

00 - (DATA)    5e 01 28 00
04 - LDX #$10  09 10
06 - LSR $01   1e 01
08 - ROR $00   26 00
0a - BCC $0d   65 0d
0c - LDA $02   02 02
0e - CLC       4c
0f - ADC $21   2e 21
11 - STA $21   06 21
13 - LDA $03   02 03
15 - ADC $22   2e 22
17 - STA $22   06 22
19 - ASL $02   22 02
1b - ROL $03   2a 03
1d - DEX       44
1e - BPL $e6   5d e6
20 - HLT       00
21 - (DATA)    00 00

Expected output:

$$\color{red}{\text{00 00 00 00}}\text{ 09 10 1e 01 }\dots\text{ 44 5d e6 00 }\color{red}{\text{b0 36}}$$

Unpacked and formatted

Because the code is compressed with an algorithm that replaces frequently repeated strings with a single character, it is more efficient to use the same code blocks over and over than defining and invoking helper functions, or storing intermediate results (such as M[A]) in additional variables.

M => p => {
  for(
    a = n = z = c = x = 0;
    a &= 255, x &= 255,
    A = [i = M[p++], p, M[p], M[p] + x][i & 3],
    i & 3 && p++,
    i && A < 256;
  ) eval((
    '(n = a = M[A], z = !(n &= 255), n &= 128);'                                + // LDA
    'M[A] = a;'                                                                 + // STA
    '(n = x = M[A], z = !(n &= 255), n &= 128);'                                + // LDX
    'M[A] = x;'                                                                 + // STX
    '(n = a &= M[A], z = !(n &= 255), n &= 128);'                               + // AND
    '(n = a |= M[A], z = !(n &= 255), n &= 128);'                               + // ORA
    '(n = a ^= M[A], z = !(n &= 255), n &= 128);'                               + // EOR
    '(n = M[A] = M[c = M[A] & 1, A] / 2, z = !(n &= 255), n &= 128);'           + // LSR
    '(n = M[A] = M[c = M[A] >> 7, A] * 2, z = !(n &= 255), n &= 128);'          + // ASL
    '(n = M[A] = c * 128 | M[c = M[A] & 1, A] / 2, z = !(n &= 255), n &= 128);' + // ROR
    '(n = M[A] = c | M[c = M[A] >> 7, A] * 2, z = !(n &= 255), n &= 128);'      + // ROL
    '(n = a += M[A] + c, c = a >> 8, z = !(n &= 255), n &= 128);'               + // ADC
    '(n = ++M[A], z = !(n &= 255), n &= 128);'                                  + // INC
    '(n = --M[A], z = !(n &= 255), n &= 128);'                                  + // DEC
    '(n = a - M[A], c = n >= 0, z = !(n &= 255), n &= 128);'                    + // CMP
    '(n = x - M[A], c = n >= 0, z = !(n &= 255), n &= 128);'                    + // CPX
    '(n = ++x, z = !(n &= 255), n &= 128);'                                     + // INX
    '(n = --x, z = !(n &= 255), n &= 128);'                                     + // DEX
    'c = 1;'                                                                    + // SEC
    'c = 0;'                                                                    + // CLC
    ' 1 && (p += (M[A] ^ 128) - 128);'                                          + // BRA
    '!z && (p += (M[A] ^ 128) - 128);'                                          + // BNE
    ' z && (p += (M[A] ^ 128) - 128);'                                          + // BEQ
    '!n && (p += (M[A] ^ 128) - 128);'                                          + // BPL
    ' n && (p += (M[A] ^ 128) - 128);'                                          + // BMI
    '!c && (p += (M[A] ^ 128) - 128);'                                          + // BCC
    ' c && (p += (M[A] ^ 128) - 128);')                                           // BCS
    .split`;`[i >> 2]
  )
}

Arnauld

Posted 2018-08-23T12:48:15.007

Reputation: 111 334

Impressive :) Not a JS pro, so -- is this indexing into some "code array" by the value of the opcode? looks nice. But if this o = ... line is executed for every instruction, this might have "unintended opcodes"? – Felix Palmen – 2018-08-25T14:59:07.487

2I should probably add a test case :o Now, I think it would have been better to allow unintended opcodes ... validity checks just waste bytes here, but probably too late to change the rules :( – Felix Palmen – 2018-08-25T15:07:28.243

Well, I was about to suggest exactly that, as it doesn't add much to the challenge and is now hard to check with custom mappings. But you probably should ask/warn @MaxYekhlakov first, as they may have implemented the rule correctly. – Arnauld – 2018-08-25T15:11:16.300

c = M[A] >> 7 & 1 <- is the &1 really needed here? – Felix Palmen – 2018-08-25T16:08:06.377

It's not. Good catch! Unfortunately, the updated source is compressed to the exact same size. – Arnauld – 2018-08-25T16:16:48.643

@FelixPalmen Taking directly a Uint8Array as input would save 25 bytes. Do you think that would be acceptable? I didn't find a relevant topic about that in Meta.

– Arnauld – 2018-08-26T12:13:16.567

2I'm pretty sure it would as your submission is a function anyways, my wording was "a list of bytes [...] any sensible format" and an Uint8Array indeed just encapsulates such a list of bytes. So if putting the bytes in a hex string is an acceptable way of representing the input, why should putting them in a container object be forbidden ... – Felix Palmen – 2018-08-26T12:17:41.320

How many bytes unpacked but unformatted? – Titus – 2018-09-05T02:35:23.140

@Titus 857 bytes

– Arnauld – 2018-09-05T06:50:10.350

2

PHP, 581 585 555 532 bytes (not -yet- competing)

function t($v){global$f,$c;$f=8*$c|4&$v>>5|2*!$v;}$m=array_slice($argv,2);for($f=0;$p>-1&&$p<$argc-1&&$i=$m[$p=&$argv[1]];$p++)$i&3?eval(explode(_,'$a=$y_$a&=$y_$a|=$y_$a^=$y_$a+=$y+$c;$c=$a>>8_$x=$y_$c=$y&1;$y>>=1_$c=($y*=2)>>8_$y+=$y+$c;$c=$y>>8_$y+=$c<<8;$c=$y&1;$y>>=1_$y--_$y++_$z=$a-$y,$c=$a<$y_$z=$x-$y,$c=$x<$y_$y=$a_$y=$x_'.$y=&$m[[0,++$p,$g=$m[$p],$g+$x][$i&3]])[$i>>=2].'$i<14&&t(${[aaaaaxyyyyyyzz][$i]}&=255);'):($i&32?$p+=($f>>$i/8-4^$i)&1?($y=$m[++$p])-($y>>7<<8):1:($i&8?$f=$f&7|8*$c=$i/4:t($x+=$i/2-9)));print_r($m);

takes PC and and OP codes as base 10 integers from command line arguments,
prints memory as a list of [base 10 address] => base 10 value.

This is not tested thoroughly yet; but there is a breakdown.

There´s the code map and here´s an overview for my mapping:

3-mode instructions:
00: LDA     04: AND     08: ORA     0C: EOR
10: ADC     14: LDX     18: LSR     1C: ASL
20: ROL     24: ROR     28: DEC     2C: INC
30: CMP     34: CPX     38: STA     3C: STX

+1: immediate
+2: absolute
+3: relative

implicit:
00: HLT
10: DEX 14: INX
18: CLC 1C: SEC

relative:
20: BRA         (0)
28: BNE 2C: BEQ (Z)
30: BPL 34: BMI (N)
38: BCC 3C: BCS (C)

side note:
code 24 results in a BNV (branch never = 2 byte NOP);
04, 08, 0C are aliases for INX, CLC and SEC
and anything above 3F is either a two byte NOP or an alias for the single mode instructions.

Titus

Posted 2018-08-23T12:48:15.007

Reputation: 13 814