80186 machine code + DOS, 91 bytes
Text version:
hm j j PPjzjzjgaAAA JSJJ RU Sq ReAA JdJJJ RfiJElK JEiS GtI And she said But that s his
Text version, with tabs (code 9) replaced by 9
and spaces (code 32) replaced by *
:
hm9j9j9PPjzjzjgaAAA9JSJJ9RU9Sq9ReAA9JdJJJ9RfiJElK9JEiS*GtI*And*she*said***But*that*s*his***
Hexdump:
68 6D 09 6A 09 6A 09 50 50 6A 7A 6A 7A 6A 67 61
41 41 41 09 4A 53 4A 4A 09 52 55 09 53 71 09 52
65 41 41 09 4A 64 4A 4A 4A 09 52 66 69 4A 45 6C
4B 09 4A 45 69 53 20 47 74 49 20 41 6E 64 20 73
68 65 20 73 61 69 64 20 20 20 42 75 74 20 74 68
61 74 20 73 20 68 69 73 20 20 20
The machine code appears in a file with extension .com
. When I run it, it prints the required message and then hangs (executing random data).
High-level explanation on what it does:
- Initializes registers with constant values
- Replaces spaces in the message by the required special symbols (
,'.$
)
- Patches the code to generate the
int 21
instruction, which prints the message
- Calls DOS
Assembly code (can be compiled with tasm
):
my_bp equ 7ah
my_si equ 7ah
my_di equ 67h
my_msg equ 13bh
.model tiny
.code
.startup
.186
org 100h
push 96dh ; ax (ah = 0; al = don't care, but see below)
push 9 ; cx
push 9 ; dx
push ax ; bx = don't care
push ax ; don't care
push my_bp
push my_si
push my_di
popa
inc cx
inc cx
inc cx
or [bp+si+my_msg-my_bp-my_si+12], cx ; ,
dec dx
dec dx
or [bp+si+my_msg-my_bp-my_si+14], dx ; '
or [bp+di+my_msg-my_bp-my_di+23], dx ; '
or [bp+si+my_msg-my_bp-my_si+30], dx ; '
inc cx
inc cx
or [bp+si+my_msg-my_bp-my_si+29], cx ; .
dec dx
dec dx
dec dx
or [bp+si+my_msg-my_bp-my_si+31], dx ; $
; 0x2049 * 0x4b6c = 0x98301cc
; So this sets cx to 1cc (a temporary constant used to patch code)
imul cx, [bp+si+my_msg-my_bp-my_si-2], 4b6ch
; 0x1cc | 0x2049 = 0x21cd (the instruction which calls DOS int 21)
; Here ah = 9 ("print" mode)
or [bp+si+my_msg-my_bp-my_si-2], cx
; At address 101, there is the constant 96d, which was loaded into ax
; 0x96d * 0x7447 = 0x448013b
; So the following sets dx to 13b (adddress of the message)
imul dx, [bp+di+101h-my_bp-my_di], 7447h
int21:
dw 2049h
db 'And she said But that s his '
end
It uses the popa
instruction to pop all registers, because regular pop
cannot fill all needed registers (e.g. pop di
is a forbidden opcode).
Addresses of bytes to patch are in the range 0x100...0x160. By luck, they can be represented as a sum of 3 bytes with allowed values:
- 0x7a in
bp
- 0x7a or 0x67 in
si
or di
- Immediate value
Patching of bytes in the message works by doing logical OR
on 0x20 (space character) and a small constant (4, 7, 12 or 14). The small constant is obtained by initializing cx
and dx
to 9 (tab character) and doing INC
or DEC
as needed.
Patching of code uses the IMUL
instruction. I found the needed 16-bit constants to multiply using brute-force search.
Finally, the address of the message (0x13b) is obtained by multiplication. To save space, I took one of the constants from one of the instructions, which contains an immediate value 0x96d
. Here the 9
part chooses a DOS print function, and the 6d
part is a free parameter. It turns out that 6d
is the only possibility which can give 0x13b after multiplication.
Disassembly of the code part:
06BA:0100 686D09 PUSH 096D
06BA:0103 6A09 PUSH +09
06BA:0105 6A09 PUSH +09
06BA:0107 50 PUSH AX
06BA:0108 50 PUSH AX
06BA:0109 6A7A PUSH +7A
06BA:010B 6A7A PUSH +7A
06BA:010D 6A67 PUSH +67
06BA:010F 61 POPA
06BA:0110 41 INC CX
06BA:0111 41 INC CX
06BA:0112 41 INC CX
06BA:0113 094A53 OR [BP+SI+53],CX
06BA:0116 4A DEC DX
06BA:0117 4A DEC DX
06BA:0118 095255 OR [BP+SI+55],DX
06BA:011B 095371 OR [BP+DI+71],DX
06BA:011E 095265 OR [BP+SI+65],DX
06BA:0121 41 INC CX
06BA:0122 41 INC CX
06BA:0123 094A64 OR [BP+SI+64],CX
06BA:0126 4A DEC DX
06BA:0127 4A DEC DX
06BA:0128 4A DEC DX
06BA:0129 095266 OR [BP+SI+66],DX
06BA:012C 694A456C4B IMUL CX,[BP+SI+45],4B6C
06BA:0131 094A45 OR [BP+SI+45],CX
06BA:0134 6953204774 IMUL DX,[BP+DI+20],7447
06BA:0139 CD21 INT 21 (after the code patches itself)
Fun fact: Normally, I would use offset message
instead of the hard-coded 13bh
, but in this case, because at the time of parsing its address is unknown, tasm generates 16-bit immediate offset, wasting 1 code byte:
06BA:0131 098A4600 OR [BP+SI+0046],CX
What about whitespace in output? (leading/trailing?) – attinat – 2019-06-13T07:04:13.067
Output must be exactly what was specified. Usually an optional trailing newline is allowed. – mbomb007 – 2019-06-13T16:19:02.853
2Darn, my esolang can't complete because there's no ability to produce output with only
a-zA-Z
. In theory I could usew
rite andE
val to create the necessary instructions, but none of+-*,%'"
can be constructed without using (at least) one of+-*,%'"0-9
. – Draco18s no longer trusts SE – 2019-06-13T16:23:19.290What about control characters? (escape, control, alt, etc.)? – Rɪᴋᴇʀ – 2019-06-13T21:11:53.213
11
(programmer-of (language 'lisp))
dislikes this. – MatthewRock – 2019-06-13T22:05:59.6374Have to admit, I didn't think this was particularly interesting at first, but the combination of repeated and unique characters really made it something fun to optimize (especially on a stack language!). Very nice. – brhfl – 2019-06-14T18:01:25.433
1
Can you clarify if extra whitespace in the output is allowed? Like extra trailing newlines? Or only whitespace in the source, plus alphabetic characters. There's a Befunge answer that prints with extra an trailing newline.
– Peter Cordes – 2019-06-15T00:24:25.090