Build the busiest beaver in x86 machine code in 32 bytes or less

Your task is to write a program in x86 machine language (any version you like) that will run through as many instructions as possible and then halt, using a maximum of 32 bytes of code and starting with zeroed-out registers.

You can assume anything about the random-access memory you like, as long as it's in a usable format for the x86 machine.

Joe Z.

Posted 2013-08-25T20:39:58.160

Reputation: 30 589

There are pretty small busy beaver Turing machines whose halting behavior is unknown (http://en.wikipedia.org/wiki/Busy_beaver#Known_values). What happens when someone implements one of them?

– asmeurer – 2013-08-25T20:53:32.590

There are a few details that could be filled in here. Do we have a stack set up for us? How big is it? Is the program free to access memory? If so, how much of it? Do we have read-write access to all 4 GB? If so, can we pick where our program is located? – breadbox – 2013-08-25T23:09:17.960

You can assume anything about the memory you want, I guess, as long as the actual registers are all zeroed out at the beginning of the code's execution. (Will this cause problems? I'm not terribly experienced with machine language.) – Joe Z. – 2013-08-26T00:25:01.237

Can we assume that the processor is in real mode and/or protected mode? Because if we can assume protected mode, that opens up a bunch of other questions about the Global Descriptor Table and the paging table, etc etc. Given those issues, it might be better to just enforce real-mode, which means much less memory available. Unless we can assume unreal mode, which I realize now I was implicitly assuming that I was in. But in any case, this should probably be explictly called out in the problem description. – breadbox – 2013-08-31T02:29:00.650

Answers

2⁵²⁴²²⁴ instructions

My program:

_start:
        mov     bl, _end
        stc
iloop:  adc     [bx], al
        inc     bx
        jnz     iloop
        jnc     _start
        a32
        loop    _start
        hlt
_end:

(Technical note: this is written for nasm. The a32 is nasm's syntax for the alternate address-size prefix byte. For masm, you would replace the a32 with defb 0x67.)

For clarity, here's the listing output:

 1                  _start:
 2 0000 B310                mov     bl, _end
 3 0002 F9                  stc
 4 0003 1007        iloop:  adc     [bx], al
 5 0005 43                  inc     bx
 6 0006 75F9                jnz     iloop
 7 0008 73F4                jnc     _start
 8 000A 67                  a32
 9 000B E2F1                loop    _start
10 000D F4                  hlt
11                  _end:

The program assumes that the processor is in real mode, and that the program is located at the bottom of a 64k segment of memory which is otherwise initialized to all-bits-zero. Its design is simple: treat the memory as a single giant unsigned integer, and increment it through all possible values, until it rolls back around to all-zeroes. Repeat this 2³² times. Then halt.

The innermost loop (lines 4‒6) is responsible for incrementing the huge integer. Each iteration adds zero or one to a single byte, depending on whether there was a carry out of the previous byte. Note that every byte in the huge integer is accessed, whether or not it has changed, so this loop always iterates 2¹⁶ - 14 times.

By the way, in case you were wondering, this code illustrates the reason why the x86 inc/dec instructions don't affect the carry flag: just to simplify this sort of multi-byte carry pattern. (This pattern came up more often back in the days of 8-bit microprocessors, when the original 8080 instruction set was defined.)

Line 7 causes the incrementing process to repeat until a one is carried out of the last byte, indicating that the huge integer has been reset to all-bits-zero. This takes a long time.

Lines 8‒9 mark the outermost loop, and causes this process to repeat 2³² times, until the ecx register rolls around to zero. This is effectively equivalent to adding another 32 bits to the giant integer.

It would be possible to add another outer loop and do this again using (say) the edx register, and then maybe using esi and edi for even more repetitions. However, it's not worth doing. The instructions to increment and loop require four bytes. Those four bytes are taken away from the giant integer. So we lose 32 bits on the RAM counter, just to add 32 bits via a register: it ends up being a wash. The only reason ecx is an exception is that it has a specialized loop instruction that fits into only three bytes. Thus the program trades 24 bits for 32, a tiny but still positive gain of 8 bits.

It's not too hard to directly calculate the number of instructions that the program executes before it halts. However, there's a much simpler way to estimate this number. The program modifies all of its memory, less the 14 bytes that contain the program, and the bx and ecx registers. This adds up to 2¹⁶ - 14 + 2 + 4 = 65528 bytes, for a total of 524224 bits. The shortcut involves realizing that, in the process of running, every single possible pattern of 524224 bits appears exactly once. For the RAM and the ecx register this is easy to see, since the program increments through every single value. For bx this is a little less obvious, since it's being changed at the same time as the value in memory is being updated. However, one can show that, given the program's structure, if a complete bit pattern were to actually appear twice, then the program would have to be in an infinite loop. Since this is not the case, each bit pattern must ultimately be visited only once. (The complete proof is left as an exercise to the reader, naturally.)

Since every possible bit pattern appears in the course of the program, the program must execute at least 2⁵²⁴²²⁴ instructions, which is approximately equal to 1.4 × 10¹⁵⁷⁸⁰⁷. (The acutal number is very slightly higher, because of the jump instructions, but the difference at this magnitude is negligible.)

Obviously, this could be significantly improved by using more than 64k of RAM. I'm holding off on the next version of the code until I figure out exactly how much RAM can be accessed.

breadbox

Posted 2013-08-25T20:39:58.160

Reputation: 6 893

Isn't that 17 instructions (34 bytes)? Or am I missing something? – Joe Z. – 2013-08-25T21:12:17.720

@JoeZ. inc is 1 byte (if you don't assemble it in 16 bit mode) – copy – 2013-08-25T21:16:43.363

Oh. So this program would actually be 25 bytes long? – Joe Z. – 2013-08-25T21:17:29.227

Yes, it's 25 bytes – copy – 2013-08-25T21:31:00.947

Sorry, I should have been more explicit. I added the listing output so as to avoid ambiguity. – breadbox – 2013-08-25T23:19:02.037

you should be able to use the 4 bytes of ECX in fewer than 4 bytes of code. How long is LOOP NOP? – John Dvorak – 2013-08-26T14:06:42.867

~2^(2^67) instructions

(at&t syntax)

start:
    movq    $0x1a,%rax
    stc
loop:
    adcb    %cl,(%rax)
    incq    %rax
    jnz     loop
    jnc     start
    hlt

Disassembled, 26 bytes:

start:
0000000000000000    movq    $0x0000001a,%rax
0000000000000007    stc
loop:
0000000000000008    adcb    %cl,(%rax)
000000000000000a    incq    %rax
000000000000000d    jne loop
0000000000000013    jae start
0000000000000019    hlt

Similar to breadbox's solution, but there's no reason to stop at 16 bits of address. My code uses x86-64 and assumes we have 2^64 bytes of memory and we use all but the memory holding the code as a giant counter.

Keith Randall

Posted 2013-08-25T20:39:58.160

Reputation: 19 865