2524224 instructions
My program:
_start:
mov bl, _end
stc
iloop: adc [bx], al
inc bx
jnz iloop
jnc _start
a32
loop _start
hlt
_end:
(Technical note: this is written for nasm. The a32
is nasm's syntax for the alternate address-size prefix byte. For masm, you would replace the a32
with defb 0x67
.)
For clarity, here's the listing output:
1 _start:
2 0000 B310 mov bl, _end
3 0002 F9 stc
4 0003 1007 iloop: adc [bx], al
5 0005 43 inc bx
6 0006 75F9 jnz iloop
7 0008 73F4 jnc _start
8 000A 67 a32
9 000B E2F1 loop _start
10 000D F4 hlt
11 _end:
The program assumes that the processor is in real mode, and that the program is located at the bottom of a 64k segment of memory which is otherwise initialized to all-bits-zero. Its design is simple: treat the memory as a single giant unsigned integer, and increment it through all possible values, until it rolls back around to all-zeroes. Repeat this 232 times. Then halt.
The innermost loop (lines 4‒6) is responsible for incrementing the huge integer. Each iteration adds zero or one to a single byte, depending on whether there was a carry out of the previous byte. Note that every byte in the huge integer is accessed, whether or not it has changed, so this loop always iterates 216 - 14 times.
By the way, in case you were wondering, this code illustrates the reason why the x86 inc
/dec
instructions don't affect the carry flag: just to simplify this sort of multi-byte carry pattern. (This pattern came up more often back in the days of 8-bit microprocessors, when the original 8080 instruction set was defined.)
Line 7 causes the incrementing process to repeat until a one is carried out of the last byte, indicating that the huge integer has been reset to all-bits-zero. This takes a long time.
Lines 8‒9 mark the outermost loop, and causes this process to repeat 232 times, until the ecx
register rolls around to zero. This is effectively equivalent to adding another 32 bits to the giant integer.
It would be possible to add another outer loop and do this again using (say) the edx
register, and then maybe using esi
and edi
for even more repetitions. However, it's not worth doing. The instructions to increment and loop require four bytes. Those four bytes are taken away from the giant integer. So we lose 32 bits on the RAM counter, just to add 32 bits via a register: it ends up being a wash. The only reason ecx
is an exception is that it has a specialized loop
instruction that fits into only three bytes. Thus the program trades 24 bits for 32, a tiny but still positive gain of 8 bits.
It's not too hard to directly calculate the number of instructions that the program executes before it halts. However, there's a much simpler way to estimate this number. The program modifies all of its memory, less the 14 bytes that contain the program, and the bx
and ecx
registers. This adds up to 216 - 14 + 2 + 4 = 65528 bytes, for a total of 524224 bits. The shortcut involves realizing that, in the process of running, every single possible pattern of 524224 bits appears exactly once. For the RAM and the ecx
register this is easy to see, since the program increments through every single value. For bx
this is a little less obvious, since it's being changed at the same time as the value in memory is being updated. However, one can show that, given the program's structure, if a complete bit pattern were to actually appear twice, then the program would have to be in an infinite loop. Since this is not the case, each bit pattern must ultimately be visited only once. (The complete proof is left as an exercise to the reader, naturally.)
Since every possible bit pattern appears in the course of the program, the program must execute at least 2524224 instructions, which is approximately equal to 1.4 × 10157807. (The acutal number is very slightly higher, because of the jump instructions, but the difference at this magnitude is negligible.)
Obviously, this could be significantly improved by using more than 64k of RAM. I'm holding off on the next version of the code until I figure out exactly how much RAM can be accessed.
3
There are pretty small busy beaver Turing machines whose halting behavior is unknown (http://en.wikipedia.org/wiki/Busy_beaver#Known_values). What happens when someone implements one of them?
– asmeurer – 2013-08-25T20:53:32.590There are a few details that could be filled in here. Do we have a stack set up for us? How big is it? Is the program free to access memory? If so, how much of it? Do we have read-write access to all 4 GB? If so, can we pick where our program is located? – breadbox – 2013-08-25T23:09:17.960
You can assume anything about the memory you want, I guess, as long as the actual registers are all zeroed out at the beginning of the code's execution. (Will this cause problems? I'm not terribly experienced with machine language.) – Joe Z. – 2013-08-26T00:25:01.237
Can we assume that the processor is in real mode and/or protected mode? Because if we can assume protected mode, that opens up a bunch of other questions about the Global Descriptor Table and the paging table, etc etc. Given those issues, it might be better to just enforce real-mode, which means much less memory available. Unless we can assume unreal mode, which I realize now I was implicitly assuming that I was in. But in any case, this should probably be explictly called out in the problem description. – breadbox – 2013-08-31T02:29:00.650