Assembly Language Quine

22

3

Write the shortest possible assembly-language quine.

Use any ISA you want, unless it has a print-quine instruction or equivalent. Examples include x86, MIPS, SPARC, MMIX, IBM BAL, MIX, VAX, JVM, ARM, etc.

You may link against the C standard library's _printf function (or the Java equivalent for JVM bytecode) for I/O.

Length will be judged both on instruction count and size of the data segment. Solutions must contain at least two instructions.

The quine should print the assembly code, not the assembled machine code.

Hoa Long Tam

Posted 2011-02-05T22:37:09.713

Reputation: 1 902

3Oh wow, this sounds like a toughy – anonymous coward – 2011-02-06T00:47:00.070

Answers

20

x86 Linux, AT&T syntax: 244

push $10
push $34
push $s
push $34
push $37
push $37
push $s
call printf
mov $0,%ebx
mov $1,%eax
int $128
s:.ascii "push $10
push $34
push $s
push $34
push $37
push $37
push $s
call printf
mov $0,%cebx
mov $1,%ceax
int $128
s:.ascii %c%s%c%c"

(I compiled it with this: gcc -nostartfiles -lc quine.S -o quine)

J B

Posted 2011-02-05T22:37:09.713

Reputation: 9 638

That's depressing, now :-( – Joey – 2011-02-06T17:14:19.950

1I'd usually say "the right tool for the job", but then again, here it doesn't feel right :D – J B – 2011-02-06T17:44:32.157

Seems to be more right than mine, though ;-) – Joey – 2011-02-06T22:37:04.120

5

gas for x86 Linux (89 88 bytes, seven instructions)

Technically, this is cheating.

mov $4,%al
mov $1,%bl
mov $b,%ecx
mov $89,%dl
int $128
mov $1,%al
int $128
b:.incbin"a"

Save in a file named a and assemble with the following commands to create the executable named a.out.

as -o a.o <a ; ld a.o

The directive .incbin includes a file verbatim at the current location. If you use this to include the source code itself, you get a nice quine.

FUZxxl

Posted 2011-02-05T22:37:09.713

Reputation: 9 656

I don't see how this is cheating actually. +1 for not using printf. – Calculuswhiz – 2020-01-09T16:24:35.200

1@Calculuswhiz .incbin to include the source code is commonly considered off limits for quines. – FUZxxl – 2020-01-09T16:25:20.337

Oh, ok. I guess that makes sense. But for this challenge, they didn't say you couldn't. – Calculuswhiz – 2020-01-09T16:28:31.230

Wouldn't it save a byte if you used mov $1,%al instead of mov %bl,%al? – Calculuswhiz – 2020-01-09T16:42:59.833

@Calculuswhiz Indeed! – FUZxxl – 2020-01-09T16:53:27.593

Couple things. 1. My compiler is requiring the space after the .incbin. Not sure if yours is. 2. You can get it down to 68 bytes if you get rid of the sys_exit. It segfaults, but still outputs as expected to stdout. – Calculuswhiz – 2020-01-10T22:55:28.817

5

JVM Bytecode Assembly (via Jasmin) – 952 960 990

.class public Q
.super java/io/File
.method public static main([Ljava/lang/String;)V
.limit stack 9
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc ".class public Q%n.super java/io/File%n.method public static main([Ljava/lang/String;)V%n.limit stack 9%ngetstatic java/lang/System/out Ljava/io/PrintStream;%nldc %c%s%c%nldc 3%nanewarray java/lang/Object%ndup%ndup%nldc 0%nldc 34%ninvokestatic java/lang/Integer/valueOf(I)Ljava/lang/Integer;%ndup_x2%naastore%nldc 2%nswap%naastore%ndup2%nswap%nldc 1%nswap%naastore%ninvokevirtual java/io/PrintStream/printf(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/PrintStream;%npop%nreturn%n.end method"
ldc 3
anewarray java/lang/Object
dup
dup
ldc 0
ldc 34
invokestatic java/lang/Integer/valueOf(I)Ljava/lang/Integer;
dup_x2
aastore
ldc 2
swap
aastore
dup2
swap
ldc 1
swap
aastore
invokevirtual java/io/PrintStream/printf(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/PrintStream;
pop
return
.end method

Sadly, Jasmin doesn't allow as many nice tricks as Microsoft's ilasm allows. But the JVM has a total of six different dup instructions that do all kinds of fun things. Reordering items on the stack is something .NET doesn't seem to support.

In any case, I guess none of my two entries are serious contenders for shortest code but I guess it's hard to make them much shorter. Therefore just for completeness :-)

Commented version with info about what's on the stack:

.class public Q
.super java/io/File
.method public static main([Ljava/lang/String;)V
.limit stack 9
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc ".class public Q%n.super java/io/File%n.method public static main([Ljava/lang/String;)V%n.limit stack 9%ngetstatic java/lang/System/out Ljava/io/PrintStream;%nldc %c%s%c%nldc 3%nanewarray java/lang/Object%ndup%ndup%nldc 0%nldc 34%ninvokestatic java/lang/Integer/valueOf(I)Ljava/lang/Integer;%ndup_x2%naastore%nldc 2%nswap%naastore%ndup2%nswap%nldc 1%nswap%naastore%ninvokevirtual java/io/PrintStream/printf(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/PrintStream;%npop%nreturn%n.end method"
ldc 3       ; stack; System.out, string, 3
anewarray java/lang/Object    ; stack: System.out, string, Object[3]
dup
dup    ; stack: System.out, string, array, array, array
ldc 0  ; stack: System.out, string, array, array, array, 0
ldc 34   ; stack: System.out, string, array, array, array, 0, 34
invokestatic java/lang/Integer/valueOf(I)Ljava/lang/Integer;
dup_x2   ; stack: System.out, string, array, array, 34, array, 0, 34
aastore  ; stack: System.out, string, array, array, 34
ldc 2    ; stack: System.out, string, array, array, 34, 2
swap     ; stack: System.out, string, array, array, 2, 34
aastore  ; stack: System.out, string, array
dup2     ; stack: System.out, string, array, string, array
swap     ; stack: System.out, string, array, array, string
ldc 1    ; stack: System.out, string, array, array, string, 1
swap     ; stack: System.out, string, array, array, 1, string
aastore  ; stack: System.out, string, array
invokevirtual java/io/PrintStream/printf(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/PrintStream;
pop
return
.end method

History:

  • 2011-02-07 02:09 (990) – First working version.
  • 2011-02-07 02:11 (960) – ldc is shorter than bipush or iconst_*.
  • 2011-02-07 02:30 (952) – Who says I need to inherit from java.lang.Object? Other class names are so much shorter :-)

Joey

Posted 2011-02-05T22:37:09.713

Reputation: 12 260

3

NASM, 223 bytes

%define a "%define "
%define b "db "
%define c "%deftok "
%define d "a, 97, 32, 34, a, 34, 10, a, 98, 32, 34, b, 34, 10, a, 99, 32, 34, c, 34, 10, a, 100, 32, 34, d, 34, 10, c, 101, 32, 100, 10, b, 101, 10"
%deftok e d
db e

Beating the accepted answer!

MD XF

Posted 2011-02-05T22:37:09.713

Reputation: 11 605

3

Windows .COM Format: 307 characters

Assembles, using A86, to 51 bytes. Requires no external libraries other than the DOS Int21 AH=9 function (write string to stdout).

db 185
db  51
db   0
db 190
db   0
db   1
db 191
db  47
db   1
db 172
db 178
db  10
db 199
db   6
db  45
db   1
db  32
db  32
db 180
db   0
db 246
db 242
db 128
db 196
db  48
db 136
db  37
db  79
db  10
db 192
db 117
db 242
db 180
db   9
db 186
db  42
db   1
db 205
db  33
db 226
db 221
db 195
db 100
db  98
db  32
db  32
db  51
db  49
db  10
db  13
db  36

Skizz

Posted 2011-02-05T22:37:09.713

Reputation: 2 225

I'm afraid I count 357 bytes. (and your program actually outputs 408) Nice implementation, though. You might want to include un-db'd assembly source so other viewers get a direct look. – J B – 2011-03-07T13:34:24.033

@J B: I didn't include CR\NL. Looking at it now, I really should have put the data into a single db line. That would make it smaller. – Skizz – 2011-03-07T18:35:33.390

2

gas for x86 Linux, 184 176 bytes

.globl main
main:movw $34,B+87
push $B
call printf
call printf
pop B
ret
.data
B:.ascii".globl main
main:movw $34,B+87
push $B
call printf
call printf
pop B
ret
.data
B:.ascii"

Build with gcc -m32 -o a.out quine.S. (The -m32 is optional if your OS is already 32-bit.)

Edited to add: If we modify the rules to allow puts to be called instead of printf then it can be done in 182 174 bytes:

.globl main
main:movw $34,B+86
push $B+1
call puts
call puts
pop B
ret
.data
B:.ascii"
.globl main
main:movw $34,B+86
push $B+1
call puts
call puts
pop B
ret
.data
B:.ascii"

(Note that this one, unlike the previous one, has a terminating newline.)

breadbox

Posted 2011-02-05T22:37:09.713

Reputation: 6 893

I appreciate the shortness. But I do feel cheated by the fact that in addition to printf/puts, you actually depend on the standard C prolog/epilog, which isn't explicitely allowed. And IMHO wasn't meant to be; but I've got the top answer: obviously I'm biased :-) – J B – 2012-05-06T20:14:27.727

Well, one could argue that using the C prolog/epilog is implicitly allowed, due to the mention of using printf(). libc functions don't always behave reliably if you bypass the C prolog/epilog. In fact on my system, your version doesn't work if I pipe the output to a file, because stdout only gets flushed in the C epilog code. (Had we instead used write(), which is just a wrapper around a syscall, it would have worked either way.) – breadbox – 2012-05-07T08:55:33.720

It's a been pretty long time now, but I seem to recall the C functions' being allowed was a surprise to me back then: it did make the problem sound kind of impure. OP hasn't been around in a long time either; it's going to be hard to request clarification now. – J B – 2012-05-07T12:23:14.537

Note that the ABI allows printf to clobber its args on the stack. It's not technically safe to just call it again and expect the same args, but it works in practice because gcc / clang never use args slots as scratch space, AFAIK. – Peter Cordes – 2016-06-27T19:44:20.870

Also, in general it's not safe to call printf from _start (e.g. in a static binary), so that is a good argument for writing a main instead of a _start. This answer explains the various ways of linking libc from static or dynamic binaries. (In a Linux dynamic binary, the dynamic linker will run glibc's initializer functions, so you can use printf from the _start entry point, but that's not the case on cygwin IIRC.)

– Peter Cordes – 2016-06-27T19:48:04.213

You could strike a balance here and call the libc exit(3) function instead of returning from main, so you're not taking advantage of the CRT code to exit your process. That will flush output buffers for you. – Peter Cordes – 2016-06-27T19:51:17.153

Also, I think (B+90) doesn't need the parens. B+90 is a memory operand in AT&T syntax. – Peter Cordes – 2016-06-27T19:52:40.263

@PeterCordes Argh, of course. Thanks for pointing that out. – breadbox – 2016-07-04T21:14:08.440

2

.NET CIL – 623 669 691 723 727

.assembly H{}.method void M(){.entrypoint.locals init(string)ldstr".assembly H{0}{1}.method void M(){0}.entrypoint.locals init(string)ldstr{2}{3}{2}stloc 0ldloc 0ldc.i4 4newarr object dup dup dup dup ldc.i4 0ldstr{2}{0}{2}stelem.ref ldc.i4 1ldstr{2}{1}{2}stelem.ref ldc.i4 2ldc.i4 34box char stelem.ref ldc.i4 3ldloc 0stelem.ref call void[mscorlib]System.Console::Write(string,object[])ret{1}"stloc 0ldloc 0ldc.i4 4newarr object dup dup dup dup ldc.i4 0ldstr"{"stelem.ref ldc.i4 1ldstr"}"stelem.ref ldc.i4 2ldc.i4 34box char stelem.ref ldc.i4 3ldloc 0stelem.ref call void[mscorlib]System.Console::Write(string,object[])ret}

A single line, no line break at the end.

Formatted and commented first version (even though it isn't a quine anymore) – it's unlikely that I deviate much from the general concept:

.assembly H{}
.method void M() {
  .entrypoint
  .locals init (
    string,
    object[]
  )
  // the string
  ldstr".assembly H{0}{1}.method void M(){0}.entrypoint.locals init(string,object[])ldstr{2}{3}{2}stloc.0 ldloc.0 ldc.i4.4 newarr object stloc.1 ldloc.1 ldc.i4.0 ldstr{2}{0}{2} stelem.ref ldloc.1 ldc.i4.1 ldstr{2}{1}{2} stelem.ref ldloc.1 ldc.i4.2 ldc.i4 34 box char stelem.ref ldloc.1 ldc.i4.3 ldloc.0 stelem.ref ldloc.1 call void[mscorlib]System.Console::Write(string,object[])ret{1}"
  stloc.0   // store in first local var
  ldloc.0   // load again. Going to be the first argument to Console::Write
  ldc.i4.4 newarr object stloc.1   // create new array and store in local var
  ldloc.1 ldc.i4.0 ldstr"{" stelem.ref   // we need a literal brace
  ldloc.1 ldc.i4.1 ldstr"}" stelem.ref   // closing, too
  ldloc.1 ldc.i4.2 ldc.i4 34 box char stelem.ref   // double quote
  ldloc.1 ldc.i4.3 ldloc.0 stelem.ref   // our format string from before
  ldloc.1 // load array
  call void[mscorlib]System.Console::Write(string,object[]) // output
  ret
}

History:

  • 2011-02-06 16:48 (727) – First working version.
  • 2011-02-06 17:14 (723) – I don't need a space after a string literal.
  • 2011-02-06 17:21 (691) – dup is shorter than writing ldloc.1 every time.
  • 2011-02-06 17:24 (669) – I don't need spaces after any literal and things like ldloc.1 can be written as ldloc 1 to make the last token a literal. The resulting bytecode is likely larger, but it's about the assembler code so I couldn't care less :-)
  • 2011-02-06 17:34 (623) – I don't need the object[] as a local variable; I can do all that on the stack directly. Nice.

Joey

Posted 2011-02-05T22:37:09.713

Reputation: 12 260

Seems like you've removed the object[] from the unformatted version, but not the formatted one... – Aurel Bílý – 2011-02-06T18:34:57.310

@Aurel: Indeed, as noted, the formatted is the very first version. The idea is still the same so I won't update it again. – Joey – 2011-02-06T18:41:08.410

1

Bootable ASM, 660 bytes

[bits 16]
mov ax,07C0h
mov ds,ax 
mov ah,0
mov al,03h 
int 10h
mov si,code
call p 
jmp $
p:mov ah,0Eh
r:lodsb
cmp al,0
je d
cmp bx,0x42
jne s
c:int 10h
jmp r
s: cmp al,94 
je re
cmp al,63
je q
jmp c
q:mov al,34
jmp c
re:push si
mov bx,0x42
mov si,code
call p 
mov bx,0
pop si
jmp p 
d:ret
code:db "[bits 16]\mov ax,07C0h\mov ds,ax\mov ah,0\mov al,03h\int 10h\mov si,code\call p\jmp $\p:mov ah,0Eh\r:lodsb\cmp al,0\je d\cmp bx,0x42\jne s\c:int 10h\jmp r\s:cmp al,94\je re\cmp al,63\je q\jmp c\q:mov al,34\jmp c\re:push si\mov bx,0x42\mov si,code\call p\mov bx,0\pop si\jmp p\\d:ret\\code:db ?^?\times 510-($-$$) db 0\dw 0xAA55"
times 510-($-$$) db 0
dw 0xAA55

Originally by jdiez17, golfed by yours truly.

MD XF

Posted 2011-02-05T22:37:09.713

Reputation: 11 605

1

GAS x64 for Linux (verified working on GCC 9.2.0):

13 12 instructions, 148 136 129 byte data section, 298 274 260 source

Self-imposed rules/assumptions:

  1. No standard library (including printf) allowed. This contaminates the assembly.
  2. Faults are ok as long as it outputs the quine to stdout.
  3. No main allowed. Puts extra garbage in the program.
  4. No .incbin allowed. This is considered input.
  5. No gcc -D. It's too close to having input.
  6. If it's loaded in program memory, it's fair game. As long as no files are accessed.
.macro M b=dx v=1
mov $&v,%r&b
.endm
M 8
q:M si p
M di
M ,129
M ax
syscall
mov $p+285,%rsi
M
M ax
syscall
dec %r8
jge q
p:.ascii ".macro M b=dx v=1
mov $&v,%r&b
.endm
M 8
q:M si p
M di
M ,129
M ax
syscall
mov $p+285,%rsi
M
M ax
syscall
dec %r8
jge q
p:.ascii "

Save as "q64".sx (only the first quote matters) and compile with:

# Start `.altmacro` enabled, add debugging info to ELF (which handily injects the filename too!)
gcc -g1 -nostdlib -nostartfiles -no-pie -Xassembler "-alternate" \"q64\".sx -o quine

Verify with diff:

% ./quine > quineresult.sx; diff \"q64\".sx quineresult.sx
[1]    52466 segmentation fault (core dumped)  ./quine > quineresult.sx

If you're wondering why it segfaults, see annotated program below

Annotated version:

// Macro definition shortcuts for moving immediates into registers
// Default value is $1, default reg is $rdx
.macro M b=dx v=1
mov $&v,%r&b
.endm

// No need for _start. Just makes warnings.

// SYS_WRITE does not clobber r8. our loop reg
M 8

q:
    // SYS_WRITE the data section
    M si p
    M di
    // Size of data section
    M ,129
    M ax
    syscall

    // SYS_WRITE the quote, which we've loaded in the
    // debug symbol table past the data section
    // Lucky us! That gets loaded in our program's memory!
    mov $p+285,%rsi
    M
    M ax
    syscall

    // Go back q once
    dec %r8
    jge q

    // No exit. Just let it fault.
    // Errors write to stderr - they can be distinguished from normal output

// Data section. Breaking strings like this is frowned upon by gcc,
// If you want, you can just replace all the `\n`s with `;`, 
// and it would still be a quine.
// Yes, the space after the .ascii is necessary
p:
    .ascii ".macro M b=dx v=1
mov $&v,%r&b
.endm
M 8
q:M si p
M di
M ,129
M ax
syscall
mov $p+285,%rsi
M
M ax
syscall
dec %r8
jge q
p:.ascii "

Come to think of it, I've never written anything designed to segfault before. #justcodegolfthings

Calculuswhiz

Posted 2011-02-05T22:37:09.713

Reputation: 193

0

x86-64, System V AMD64 ABI, GASM: 432

.att_syntax noprefix
.globl main
main:
pushq rbp
movq rsp, rbp
mov $.Cs, rdi
mov $0xa, rsi
mov $0x22, edx
mov $.Cs, ecx
mov $0x22, r8d
mov $0xa, r9d
xor eax, eax
call printf
xor eax, eax
leave
ret
.Cs: .string ".att_syntax noprefix
.globl main
main:
pushq rbp
movq rsp, rbp
mov $.Cs, rdi
mov $0xa, rsi
mov $0x22, edx
mov $.Cs, ecx
mov $0x22, r8d
mov $0xa, r9d
xor eax, eax
call printf
xor eax, eax
leave
ret%c.Cs: .string %c%s%c%c"

lxgr

Posted 2011-02-05T22:37:09.713

Reputation: 101

1You don't need a space after the comma between operands. And you don't need xor eax,eax at all if you don't care about the exit status of your program. It still prints itself, even if it exits with a non-zero status. You can also use push instead of pushq. Actually, why are you even making a stack frame at all? Drop the push rbp / mov rsp, rbp and leave. You could also use shorter label names. .Cs is 3 characters when 1 would be fine. – Peter Cordes – 2016-06-27T19:56:04.527

After that, .att_syntax noprefix probably doesn't pay for itself anymore. .intel_syntax noprefix would let you drop those six $ prefixes, too. but probably still not worth it. (You could use lea ecx,.Cs instead of the intel-syntax mov ecx,offset .Cs) – Peter Cordes – 2016-06-27T20:03:28.590

0

TAL

push puts
push \100
push {push puts
push \100
push {@}
dup
strmap
invokeStk 2}
dup
strmap
invokeStk 2

To execute it, call ::tcl::unsuppoted::assemble with the code as argument.
Tcl 8.6 only.

Johannes Kuhn

Posted 2011-02-05T22:37:09.713

Reputation: 7 122

3You should include the byte count. – MD XF – 2017-05-24T02:07:08.473

0

80x86 TASM, 561 bytes

MODEL TINY
.CODE
.STARTUP
DB 177
DB 076
DB 186
DB 044
DB 001
DB 172
DB 180
DB 036
DB 179
DB 004
DB 191
DB 080
DB 001
DB 079
DB 136
DB 037
DB 212
DB 010
DB 004
DB 048
DB 134
DB 196
DB 075
DB 117
DB 244
DB 180
DB 009
DB 205
DB 033
DB 178
DB 071
DB 226
DB 228
DB 178
DB 038
DB 205
DB 033
DB 195
DB 013
DB 010
DB 069
DB 078
DB 068
DB 036
DB 077
DB 079
DB 068
DB 069
DB 076
DB 032
DB 084
DB 073
DB 078
DB 089
DB 013
DB 010
DB 046
DB 067
DB 079
DB 068
DB 069
DB 013
DB 010
DB 046
DB 083
DB 084
DB 065
DB 082
DB 084
DB 085
DB 080
DB 013
DB 010
DB 068
DB 066
DB 032
END

MD XF

Posted 2011-02-05T22:37:09.713

Reputation: 11 605