11

1

## Challenge:

Your challenge is to write an interpreter for Whitespace. Given a string consisting of spaces, tabs, newlines, and potential other characters, as well as possible inputs for the Whitespace program itself, output the result of the given Whitespace program.

## Here an overview of the Whitespace language and its builtins:

Whitespace is a stack-based language which uses only three characters: spaces (ASCII codepoint 32); tabs (ASCII codepoint 9); and newlines (ASCII codepoint 10); all other characters are ignored.
It only has a couple of basic builtins, which I will go over below. Whitespace has both a stack, which can only consists of integers. As well as a heap, which is a map of integers (both the key and value).

From here on out I will be using S for spaces, T for tabs, and N for newlines to make the text more compact.

Stack Manipulation (always starts with leading S):

• Push a number to the stack: SS, followed by either an S/T for positive/negative respectively, followed by some S and/or T which is the binary representation of the number (S=0; T=1), followed by a trailing newline N. Some examples:
• SSSTN pushes the number 1; a positive integer with binary 1.
• SSTTSN pushes the number -2; a negative integer with binary 10.
• SSSTSTSN pushes the number 10; a positive integer with binary 1010.
• SSTTTSSTSSN pushes the number -100; a negative integer with binary 1100100.
• Pushing number 0 is an edge case, since it can be done in multiple ways. Some examples:
• SSSN: push a positive integer without any binary digits.
• SSTN: push a negative integer without any binary digits.
• SSSSN: push a positive integer with binary 0.
• SSTSSSN: push a negative integer with binary 000.
• Duplicate the top of the stack: SNS.
• Copy the 0-based $$\n\$$th item from the top of the stack to the top of the stack: STS followed by a number similar as mentioned earlier (excluding the leading SS). I.e. let's say the stack currently contains the integers [47,12,0,55], then we could use STSSTSN to copy the 0-based 2nd item (which is the 12 in this case) to the top. So the stack becomes: [47,12,0,55,12].
• NOTE: This index may not be negative. On TIO this would result in a negative index error, but that same program would push a 0 in the vii5ard interpreter, and it could even be different in yet another interpreter. For the sake of this challenge, you can therefore assume a given copy will never be negative. So the copy will always start with STSS, followed by the binary of the top-to-bottom index, followed by a trailing N.
• Swap the top two items on the stack: SNT.
• Discard the top item of the stack: SNN.
• Discard/slice $$\n\$$ items from the top of the stack, but keep the top item: STN, followed by a number similar as mentioned earlier (excluding the leading SS). I.e. let's say the stack currently contains the integers [1,2,3,4,5,6,7], then we could use STNSTTN to discard 3 items from the stack (except the top one). So the stack becomes: [1,2,3,7].
• You can again assume no negative slice values will be used, so this will always start with SNNS, followed by the amount to slice, followed by a trailing N.

Arithmetic (always starts with leading TS):

• Addition; add the top two items on the stack together: TSSS.
• Subtraction; subtract the top item of the stack from the (top-1)th item of the stack: TSST.
• Multiplication; multiply the top two items on the stack: TSSN.
• Integer division; integer divide the (top-1)th item of the stack by the top item of the stack: TSTS. (NOTE: Since Whitespace only has integers, this will always be integer division and never result in decimal values.)
• Modulo; take the modulo of the (top-1)th item of the stack with the top item of the stack: TSTT.
• You can assume no negative values will be used for the modulo.

For each of these two argument builtins the same applies: if none or only a single integer is on the stack, this will result in an error. How you implement this error in your interpreter is up to you, as long as the program stops when this occurs. I thought about not allowing it for the sake of this challenge, but decided not to because it's a commonly used strategy to stop a program with an error when printing hardcoded texts, using the approach explained in this Whitespace codegolfing tip.

Heap Access (always starts with leading TT):

• Pop the top two items of the stack, and store the top item in the (top-1)th address of the heap: TTS. I.e. let's say the stack contains the integers [1,2,3,4,5] and the heap already contains [{2:10}]. When you use this store builtin twice in a row, the stack would contain [1] and the heap will contain [{2:3},{4:5}] (note how the {2:10} has been replaced with the {2:3}).
• NOTE: Just like with the Arithmetic builtins, if no or a single argument is given, it will cause an error. But for the sake of this challenge you can assume this will never be given for this builtin.
• Pop the top item of the stack, and push the item corresponding with that heap address to the top of the stack: TTT. I.e. let's say the stack contains the integers [1,4] and the heap contains [{2:3},{4:5}]. If you now use the retrieve builtin once, the stack would become [1,5] (and the heap will remain the same).
• NOTE: If you use an address that isn't in the heap yet (or the heap is empty), it will push a 0 to the stack instead. But for the sake of this challenge you can also ignore this.

Flow Control (always starts with leading N):

• Mark a location in the program with a label: NSS, followed by some (optional) S/T which aren't used by other labels/subroutines, followed by an N. I.e. if you're only using a single label in your full program, NSSN would be what to use when code-golfing. If you need two or three labels, you can add NSSSN and/or NSSTN.
• Although it is possible to have multiple of the same labels in the TIO and vii5args interpreters, it will cause issues, so we assume the input will always only create a label/subroutine once.
• Also, although NSSN would be a logical first label to use, it's completely valid to use a label NSSTSTSTTTSN instead as only label in the program.
• Call a subroutine with the given label: NST, followed by some (optional) S/T which aren't used by other labels/subroutines, followed by an N. I.e. NSTTSTSTTTSN would jump to the label TSTSTTTS as subroutine.
• Jump unconditionally to a label: NSN, followed by some (optional) S/T, followed by an N. I.e. NSNN would jump to the (empty) label N and continue the program flow from there.
• Jump to a label if the top of the stack is exactly 0: NTS, followed by some (optional) S/T, followed by an N. I.e. if the stack is currently [4,1,0] and we'd use NTSSN, it would jump to the label SN and continue the program flow from there (with stack [4,1]). If instead the stack is currently [4,1] and we'd use the NTSSN, it would jump past it to the next builtin below it (with stack [4]).
• Jump to a label is the top of the stack is negative: NTT, followed by some (optional) S/T, followed by an N. I.e. if the stack is currently [4,1,-10] and we'd use NTTTN, it would jump to the label TN and continue the program flow from there (with stack [4,1]). If instead the stack is currently [4,1] and we'd use the NTTTN, it would jump past it to the next builtin below it (with stack [4]).
• Minor note: There is no Jump to label if positive builtin available in Whitespace.
• End a subroutine, and go back to the caller (a.k.a. return): NTN.
• End the entire program: NNN (everything after that becomes no-ops).

I/O (always starts with leading TN):

• Pop the top integer, and print as character with that codepoint to STDOUT: TNSS. I.e. if the stack is currently [10,101] and we'd call the TNSS twice, it will output a lowercase e followed by a newline to STDOUT.
• Pop the top integer, and print as integer to STDOUT: TNST.
• Pop the top integer, and read a character from STDIN, for which its codepoint-integer will be stored in the heap with the popped integer as address: TNTS. I.e. if the stack is [0,0], the heap is empty, and STDIN contains the capital letter I, and we'd use TNTS. The stack will become [0] and the heap [{0:73}]. (After which we could use the retrieve builtin TTT to put this input on the stack.)
• Pop the top integer, and read an integer from STDIN, which will be stored in the heap with the popped integer as address: TNTT.

## Challenge rules:

• You can assume the Whitespace input is always valid with the builtins above. So the compilation phase would always succeed.
• You can assume executing the Whitespace input will never result in any errors, except when the Arithmetic builtins will only get 0 or 1 stack-arguments instead of the required 2. Although there are loads of other possible errors when executing a Whitespace program, like negative indices, jumps to labels that doesn't exist, read from STDIN when it's empty, etc. For the sake of this challenge you can assume none of those kind of errors will occur, and the input-program is error-free (except for the Arithmetic builtins).
• You can assume the additional inputs given will be valid as well. So an integer when we want to read an integer or a character / string of multiple characters if we want to read those.
• The Whitespace program-input can be taken in any reasonable format. Could be a string, list of characters, list of codepoint-integers, etc. Same applies to the other inputs.
• Any non-whitespace character in the Whitespace-program input is ignored. You can assume the no-op characters will only be printable ASCII. So the Whitespace input will only contain the UTF-8/ASCII characters with the codepoints [9, 10, 32..126].
• Whitespace numbers can in theory be as large as you want, but for the sake of this challenge you can assume the integers used will be in the range [-9223372036854775808, 9223372036854775807] (signed 64 bit integer).

## General rules:

• This is , so shortest answer in bytes wins.
Don't let code-golf languages discourage you from posting answers with non-codegolfing languages. Try to come up with an as short as possible answer for 'any' programming language.
• Standard rules apply for your answer with default I/O rules, so you are allowed to use STDIN/STDOUT, functions/method with the proper parameters and return-type, full programs. Your call.
• Default Loopholes are forbidden.
• If possible, please add a link with a test for your code (i.e. TIO).
• Also, adding an explanation for your answer is highly recommended.

## Test cases:

You should copy the actual test cases from the Try it online links to verify your interpreter.

Input(s):
SSTTSTTTTNSSSTSNSSSTNSTSSTNSSTTSTNSSTTSNSNSSSSTNSTSSTNSSTTSSTTTSNSSTTTSSTTNSSSNSSTTSTNSSTTTNSSSTSNSSTTNSSSTTTNSSSTSNSSTTSSTTTSNSSSTSNSSTTNSSSTTTNSSTTSNSSSTSNSSTTSSTTTSNSSTTTSSTTNSSTTTNSSTTSNSSTTSTNSSTTNSSTTSSTTTSNSSTTNSSSTTTNSSTTSTNSSSNSSTTSTNSSTTSNSNSSSSTNSSTTTTTSNNSSNSSSTTSTTTSNTSSSTNSSNSNN
Builtins used:
Push num; duplicate; copy; add; create label; jump to label; print as char
Output:
Pollinium milk; plump pumpkin; lollipop?

Input(s):
SSTTSNSNSTSSNTNST
Builtins used:
Push num; duplicate; multiply; print as int
Output:
4


Try it online as actual program using spaces, tabs, newlines, and comments.

Input(s):
SSTTNTNSTNNNTSNTNTSSS
Builtins used:
Push num; print as int; stop program; (no-ops)
Output:
-1

Input(s):
SSSNSNSSNSTNTTTTTNSSNSNSTNSTSSSTSTSNTNSSSNTSSSTNTSSSSTSSTNSTSSTNSNSSSSTSSNTSTTSNSNTSSNSSSTNTSSTSNSNTSTNSSSTNTSSTNTSSSNSNTSTSSTNTSSTNSNNNSSSNSNNTSTSSTSSTSNSTSSTSNTSTTNTSNNNNNSSTNSNNTSSNNSNNNSSSSNTSSSNSNN
21
Builtins used:
Push num; duplicate; swap; copy; discard; add; subtract; multiply; divide; modulo; retrieve; create label; jump to label; jump to label if 0; stop program; read as int; print as char; print as int
Output:
21
21
23
20
5
25
31
24
3
27
37
26

Input(s):
NSSNSSSNSNSTNTSTTTSNSSSSTSTSNTSSTNTSSNSSSNSNSTNTSTTTTSSTNTSNSSSNTNSTNSSSN
TThhiiss  iiss  ddoouubblee  ssppeeaakk!!\n
Builtins used:
Push num; duplicate; subtract; retrieve; create label; jump to label if 0; read as char; read as int
Output:
0

Inputs(s):
SSSNSNSTNTTTTTSSSNSNSTNTTTTTTSSSTNST
-3
5
Builtins used:
Push num; duplicate; add; retrieve; read as int; print as int
Output:
2

Input(s):
SSSTSSSTTTTTSSSSSSSTTSSSTSTTTTTSSTTSNTNST
Builtins used:
Push num; print as int
Output:
4815162342

Input(s):
SSSTTTNTNTSSSSTSSSSTNSSSTSTSNNSSTTNSSSTNTSSTSNSNTTSNSTSSTNSNTNSNTTNNSSSNSSSTTTNTTTSTNSSSTSTNTNSTSSSTSNNSSTNSNTSNNSNTTNSSSSSTSNTSTSSNSNTSSSNNSNTNNSSSSNNNN
〹
Builtins used:
Push num; duplicate; swap; discard; copy; slice; retrieve; create label; jump to label; jump to label if 0; jump to label if negative; exit program; read as char; print as char; print as int
Output:
12345!!


Try it online as actual program using spaces, tabs, newlines, and comments.

Inputs(s):
SSSTTNSNSSNSSNSTTSSSSTTTNTTSNSTNTNSSNNNSSTNNSSNTTTNTNSSTN
Builtins used:
Push num; duplicate; store; retrieve; call subroutine; return; exit program; print as char; (no-ops)
Output:
(character with unicode value 7)


Try it online as actual program using spaces, tabs, newlines, and comments.

@Arnauld Here is the relevant part in my challenge description at the arithmetic operations: "For each of these two argument builtins the same applies: if none or only a single integer is on the stack, this will result in an error. How you implement this error in your interpreter is up to you, as long as the program stops when this occurs. I thought about not allowing it for the sake of this challenge, but decided not to because it's a commonly used strategy to stop a program with an error when printing hardcoded texts, using the approach explained in this Whitespace codegolfing tip." – Kevin Cruijssen – 2020-01-31T13:37:32.230

@Arnauld So instead of an infinite loop, it should stop the program (with or without an error is up to you) when the stack contains just a single item when the add builtin is called. As I mentioned, although I thought about always assuming the input is valid so no need to do error handling, this one is the only exception, since it's commonly used with this Whitespace golfing tip. I will also add it to the rules to emphasize this.

– Kevin Cruijssen – 2020-01-31T13:38:08.523

Oh, I see. I -- obviously -- missed that part of the spec. Thank you. – Arnauld – 2020-01-31T13:39:18.010

@Arnauld Ah, it was already part of the rules. :) I've made it bold now. And I can definitely understand, it's a huge wall of text to consume. – Kevin Cruijssen – 2020-01-31T13:40:33.280

Doesn't the "Double speak!" test case rely on another infinite loop without any arithmetic operation and therefore no way to break the loop on an error? – Arnauld – 2020-01-31T16:00:34.660

Also, I'm probably misunderstanding how the penultimate test case is supposed to work. It seems like we put a couple of 33 on the stack followed by a -1, retrieve 12345, do a slice (I have [ 33, 33, 33, 33, 33, 33, 33, 12345 ] at this point) and then immediately discard 12345. So I don't understand how it can be printed. – Arnauld – 2020-01-31T17:14:31.983

As a follow-up about the penultimate test case: you say that print as int is used. But TNST doesn't appear anywhere in the source code. – Arnauld – 2020-01-31T17:22:06.823

In the control flow section, you write "Call a subroutine with the given label: NST, followed by some (optional) S/T which weren't used by previous labels/subroutines yet, followed by an N. I.e. NSTTSTSTTTSN would jump to the label TSTSTTTSN as subroutine.". Wouldn't this jump be to the subroutine named TSTSTTTS? – RGS – 2020-01-31T18:35:02.450

Also, by "call" here do you mean "give it a name" or do you mean "execute/evoke" a subroutine that has previously received a name? If you mean the latter, then why do you talk about "some optional S/T which weren't used by previous labels/subroutines"? – RGS – 2020-01-31T18:37:23.770

@RGS You're right. I've fixed the TSTSTTTSN to TSTSTTTS, as well as changed the working to not use previous, since a Label is usually later on in the program instead of earlier. It was meant in the way that there won't be any duplicated creation of labels. – Kevin Cruijssen – 2020-02-01T11:20:49.367

@Arnauld Removed the Double Speak! test case, since it indeed relied on a missing input to stop the infinite loop, so it was outside of the spec I described. As for the penultimate test case, I had a typo in a comment of my TIO (although the whitespaces themselves were correct). I stated [SSSTN_Push_-1] instead of [SSSTN_Push_1]. Your stack [33, 33, 33, 33, 33, 33, 33, 12345] is correct, but the slide builtin "Discard/slice $n$ items from the top of the stack, but keep the top item". As given as example: [1,2,3,4,5,6,7] with STNSTTN to slice 3 items would result in [1,2,3,7]. – Kevin Cruijssen – 2020-02-01T11:25:54.387

@Arnauld You were also right about the pentultimate test case missing a print_int (although it was there in the TIO code). Sorry about that. Also, I realize this might come little too late in the day (or "mustard after the meal" as we Dutchman would say), but perhaps it helps pasting the TIO code to the vii5ard interpreter, since it has a built in debugger (with step-by-step execution, breakpoints and allowing you to see the stack/heap values). Sorry about the error in test cases..

– Kevin Cruijssen – 2020-02-01T11:32:40.127

1Thank you for the update. The penultimate test case still ends with an infinite loop, though, so we can't return the result with a function. I think you should either update the rules about using either a program or a function, or update the spec about the way to escape an infinite loop. (It seems like DDoouubbllee ssppeeaakk!! does the same, but I haven't debugged this one yet.) – Arnauld – 2020-02-01T13:08:05.683

1@Arnauld Ah, you're right. Ugh.. maybe I should have just added the same error handling as TIO, since I've been using at least some in my Whitespace answers. Like exit with error when we use Add with just a single value on the stack; when there's nothing more to read; when there's nothing more to print; when I jump to a label that doesn't exist; etc. I left most of those out to make the challenge -every so slightly- easier, but I fwcked up my test cases because of it.. I've modified the pentultimate test case. Let me know if you spot other errors in the other test cases. – Kevin Cruijssen – 2020-02-01T16:21:30.593

1That looks OK now. I've retracted my close vote. (The irony here is that it may actually be easier to just exit whenever the program tries to pop from an empty stack rather than just for arithmetic operations.) – Arnauld – 2020-02-01T16:40:20.937

@Arnauld Good point.. Then again, if I'd introduce error handling in all cases where it tries to do something with a stack that's empty, I'd probably also include reading a non-existing STDIN or jumping to a label that doesn't exist resulting in an error as rule, which make it (I think) harder again. And nice answer btw! Looking forward seeing the explanation (after you've golfed it down some more I guess). – Kevin Cruijssen – 2020-02-01T18:22:04.767

7

# JavaScript (Node.js),  608 597 586  551 bytes

Takes input as (source)(input), where input is an array containing single characters and/or integers.

s=>I=>eval([..."@CDEJKLMNQTUWYZ^_bkmqw"].reduce((s,c)=>(a=s.split(c)).join(a.pop()),"(R=X=>(H={},o=P=[],S=[],z=x=p=i=0,gUs[p]?~(j= 	\n.indexOf(s[p++]N?j:gK:3,hUgK<2?Mn*2+j):V=n,GUx=z--&&S.pop(y=x),FUg(x=mbQQ$mS[z+~(b])TSCce((z-=V=b-1,V)$mkJLmyJkTTLH[x]=y$mHD)qR[M1)]=_?R[P@p),V]:_Y_&!kY_&k<0Yp$E?P.popK:pT$^QQQQQqW+yw-yw*yw/y|0w%yN:^QTTo+=Buffer(D)$o+=k$Z.codePointAtKZCt$[n-9],x&&eval(x,n=1N<3&&F(n*3+jN(1N(0)||R(1)||owN:^$WqTT$mz=S@kGKbgK?-M0):M0N_p$M1),E^E?g:pZ$HD=I[i++]Y?R[V]:Wz>1?(LmxU=n=>TQqqN))Mh(Lk,k,K()J),mx)\$Ep=XD[k]C.spli@.pusM"))


Try it online! (all test cases without comments)

Try it online! (an example with a commented source code as input)

### How?

Because the code is packed with RegPack, it is more efficient to repeat the same pieces of code over and over rather than defining too many helper functions.

The sequences of spaces, tabs and linefeeds are converted to an integer in base 3 with a leading $$\1\$$ until they match a known value:

 seq. | mnemonic | base 3 | decimal
------+----------+--------+---------
SS   |    PSH   |    100 |     9
STS  |    CPY   |   1010 |    30
STN  |    SLI   |   1012 |    32
SNS  |    DUP   |   1020 |    33
SNT  |    SWP   |   1021 |    34
SNN  |    DIS   |   1022 |    35
------+----------+--------+---------
TTS  |    PUT   |   1110 |    39
TTT  |    GET   |   1111 |    40
------+----------+--------+---------
NSS  |    LBL   |   1200 |    45
NST  |    JSR   |   1201 |    46
NSN  |    JMP   |   1202 |    47
NTS  |    JZE   |   1210 |    48
NTT  |    JMI   |   1211 |    49
NTN  |    RTN   |   1212 |    50
NNN  |    END   |   1222 |    53
------+----------+--------+---------
TSSS |    ADD   |  11000 |   108
TSST |    SUB   |  11001 |   109
TSSN |    MUL   |  11002 |   110
TSTS |    DIV   |  11010 |   111
TSTT |    MOD   |  11011 |   112
------+----------+--------+---------
TNSS |    CHR   |  11200 |   126
TNST |    INT   |  11201 |   127
TNTS |    RCH   |  11210 |   129
TNTT |    RNU   |  11211 |   130


The commands are simply stored in a lookup table and executed with eval().

Everything is wrapped within the function $$\R\$$ which is called twice:

• First pass ($$\X=0\$$): the jumps and the stack errors are ignored so that the whole code is parsed linearly and all labels are stored
• Second pass ($$\X=1\$$): the code is executed normally

### Unpacked and formatted

s => I => (
R = X => (
H = {},
o = P = [],
S = [],
z = x = p = i = 0,
g = n => s[p] ? ~(j =  \t\n.indexOf(s[p++])) ? j : g() : 3,
h = n => g() < 2 ? h(n * 2 + j) : V = n,
G = n => x = z-- && S.pop(y = x),
F = n => g(
x = [
/* PSH */ "z=S.push(g()?-h(0):h(0))",,,,,,,,,,,,,,,,,,,,,
/* CPY */ "z=S.push(S[z+~(g()?-h(0):h(0))])",,
/* SLI */ "S.splice((z-=V=g()?-h(0):h(0))-1,V)",
/* DUP */ "z=S.push(G()),z=S.push(x)",
/* SWP */ "G(),G(),z=S.push(y),z=S.push(x)",
/* DIS */ "G()",,,,

/* PUT */ "G(),G(),H[x]=y",
/* GET */ "z=S.push(H[G()])",,,,,

/* LBL */ "R[h(1)]=p",
/* JSR */ "h(1),p=X?R[P.push(p),V]:p",
/* JMP */ "h(1),p=X?R[V]:p",
/* JZE */ "h(1),p=X&!G()?R[V]:p",
/* JMI */ "h(1),p=X&G()<0?R[V]:p",
/* RTN */ "p=X?P.pop():p",,,
/* END */ "p=X?g:p",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

/* ADD */ "z>1?(G(),G(),z=S.push(x+y)):p=X?g:p",
/* SUB */ "z>1?(G(),G(),z=S.push(x-y)):p=X?g:p",
/* MUL */ "z>1?(G(),G(),z=S.push(x*y)):p=X?g:p",
/* DIV */ "z>1?(G(),G(),z=S.push(x/y|0)):p=X?g:p",
/* MOD */ "z>1?(G(),G(),z=S.push(x%y)):p=X?g:p",,,,,,,,,,,,,,

/* CHR */ "o+=Buffer([G()])",
/* INT */ "o+=G()",,
/* RCH */ "H[G()]=I[i++].codePointAt()",
/* RNU */ "H[G()]=I[i++]"
][n - 9],
x && eval(x, n = 1)
) < 3 && F(n * 3 + j)
)(1)
)(0) || R(1) || o


This is awesome :D – RGS – 2020-02-02T23:27:07.623