Shortest ELF for "Hello world\n"?

21

4

Is it possible to write (pack) a shorter than 145 bytes version of a program with "Hello world" (plus new line) output if the length of the program is measured as a number of bytes in program's ELF (x86) representation? Reduction technique in mind is described here:

  1. http://timelessname.com/elfbin/
  2. http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Please reply with providing a code example. Note, 1 says "Hi world" and [2] returns 42 instead of an output, hence I assume current solution to be 142 + (len("Hello") - len("Hi")) which is 145.

[EDIT]

Limits of ELF: syntactically a program translated into ELF is determined by its interpreter (libc), which is a combination of:

  • Target architecture (Generic, AMD64, ARM, IA-32, MIPS, etc.)
  • OS kernel (which communicates with implementation of ELF interpreter and often is itself packed into an ELF binary)

From TIS spec. 1.2

There is one valid program interpreter for programs conforming to the ELF specification for the Intel architecture: /usr/lib/libc.so.1

In theory - all clear - multiple combinations times version differences (Arch, OS) are possible. In practice - because standard is less strict than any BNF for CFG or other finer (smaller) formal language - there can be implementation differences (options) including shorter length of the program.

On one hand, because ELF is not that precise (one naturally expects) it is much more difficult to do code-golf with it. Hence a "Hello world\n" program is expected to be by default in TIS ELF 1.2, x86 (Generic Intel), Linux 2.6.(20+). On the other hand having a shorter ELF that runs e.g. on *BSD seems like an extremely valuable knowledge to me!

For example ELF64 (latest draft) is much more interesting incl. the differences with ELF(x86). So, please share a solution that can be accepted with correction to specific configuration.

Motivation: It is the practical part that is interesting for me and code-golf is a very nice way to show how tricky it is to come up with any machine- (or even byte) code in general and why ELF does apparently such a good job (in my understanding). Thus a golf solution is not only cool per say but also can provide a practical knowledge of unexpected interpretation differences of ELF (if an alternative combination is given).

Yauhen Yakimovich

Posted 2012-05-01T20:21:10.453

Reputation: 319

Question was closed 2016-12-19T21:43:24.167

1

It's trivial to modify Adam Rosenfield's 116-byte ELF file to include a newline, giving a 117-byte executable that prints "Hello world\n" in a Linux terminal window. The byte-sequence is 7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 02 00 03 00 01 00 00 00 54 80 04 08 34 00 00 00 00 00 00 00 00 00 00 00 34 00 20 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 80 04 08 00 80 04 08 74 00 00 00 74 00 00 00 05 00 00 00 00 10 00 00 B0 04 31 DB 43 B9 69 80 04 08 31 D2 B2 0C CD 80 31 C0 40 CD 80 48 65 6C 6C 6F 20 77 6F 72 6C 64 0A.

– r.e.s. – 2012-05-03T02:41:10.207

Thanks for pointing out the duplication. I was not aware of it. @FUZxxl What is the problem of having a more precisely asked question? I see the redundancy with the previous question but I don't like the way it is asked at all, e.g. it says nothing about code-golf. If "less than 20 byte" is just an olympic puzzle then there is no duplication per say. Another matter - are you really an opinion that simply closing creates a better Q&A knowledge base? I have not even found "less than 20 byte" question. IMHO wikipedia achieves something similar with redirects - much more constructive. – Yauhen Yakimovich – 2012-05-03T08:59:53.363

1If it is not obvious enough - my current question is about "studying" ELF only which has normally longer header due to lot of metadata. E.g. going via asm code solution does here no good. Coming up with a new shorter execution format special designed for this question is also not accepted. It's about limits of ELF. Thus modification of Adam Rosenfield's 116-byte ELF file is the best answer so far. – Yauhen Yakimovich – 2012-05-03T09:07:13.150

1The scope of the question seems basically limited to x86 machines running an operating system that uses the ELF standard. Would that be virtually all x86 non-Windows machines? (BTW, I find that the executable mentioned in my previous comment -- just copy/paste/save the binary using a hex editor -- runs successfully on my x86-64 under both 32- and 64-bit versions of Ubuntu. Isn't that unexpected?) – r.e.s. – 2012-05-03T14:10:49.413

Since we typically have tags for language-specific questions, I went and created an [tag:executable-binary] tag for this question. I decided not to create [tag:x86] or [tag:elf] yet -- let's see if we get more questions like this first. – Ilmari Karonen – 2012-05-03T15:01:58.620

@FUZxxl: Duplicate of an SO question doesn't really count, does it? I wouldn't search SO before asking a question, too. – user unknown – 2012-05-04T01:48:59.060

@r.e.s. Linux for amd64 has code that can run programs compiled for x86, provided that all libraries the program needs are available for x86 as well. Actually, you can even use the old int $0x80 method to call the operating system - even from amd64 code! – FUZxxl – 2012-05-04T06:11:06.023

Another issue for the OP: I think the question should make it clear whether the desired "shortest ELF" executable binary is required to adhere to the ELF standard. (The cited tutorial admittedly violates it eventually, whereas the posting linked in my "answer" claims adherence to it.) – r.e.s. – 2012-05-04T12:15:11.643

ELF (x86) files are not compatible between operating systems. NetBSD might require a longer solution than Linux. – kernigh – 2012-05-04T17:20:29.297

Sorry, for not being precise enough after all @r.e.s. I hope new correction is much clearer. Asking code-golf about binary executable can be valid and reasonable but only if expected configuration is clearly stated (by OP or AP). – Yauhen Yakimovich – 2012-05-05T15:50:41.623

@Mego Hi Mego, I am not sure it if mine question is a duplicate. It was asked in 2012 not in 2015. But you have to figure it out yourself. It is really not about comparing programming languages in general. It is a very concrete task requiring knowledge of the machine code i.e. ELF dialect. I would not accept solution in ruby or perl. So no it is not really a duplicate. – Yauhen Yakimovich – 2016-12-20T10:43:56.363

Answers

21

Even if you require full adherence to the ELF specification, you can squeeze it into 98 bytes:

            org     0x04B34000
            db      0x7F, "ELF", 1, 1, 1, 0 ; e_ident
            dd      0, 0
            dw      2                       ; e_type
            dw      3                       ; e_machine
            dd      1                       ; e_version
            dd      _start                  ; e_entry
            dd      phdr - $$               ; e_phoff
            dd      0                       ; e_shoff
            dd      0                       ; e_flags
            dw      0x34                    ; e_ehsize
            dw      0x20                    ; e_phentsize
phdr:       dd      1                       ; e_phnum       ; p_type
                                            ; e_shentsize
            dd      0                       ; e_shnum       ; p_offset
                                            ; e_shstrndx
            db      0                                       ; p_vaddr
_start:     inc     eax
            mov     bl, 4
            mov     dl, 12                                  ; p_paddr
            jmp     short part2
            dd      filesize                                ; p_filesz
            dd      filesize                                ; p_memsz
            dd      5                                       ; p_flags
            dd      0x1000                                  ; p_align
str:        db      'Hello world', 10
part2:      mov     ecx, str
again:      xchg    eax, ebx
            int     0x80
            jmp     short again
filesize    equ     $ - $$

Or if you prefer hex bytes: 7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 02 00 03 00 01 00 00 00 35 40 B3 04 2C 00 00 00 00 00 00 00 00 00 00 00 34 00 20 00 01 00 00 00 00 00 00 00 00 40 B3 04 B2 0C EB 1C 62 00 00 00 62 00 00 00 05 00 00 00 00 10 00 00 48 65 6C 6C 6F 20 77 6F 72 6C 64 0A B9 4C 40 B3 04 93 CD 80 EB FB

The ELF specification explicitly permits different sections to overlap, so it is perfectly legal to let the program segment header table share bytes with the ELF header. Furthermore, the x86 version of the ELF standard states that the p_paddr field is ignored, as opposed to merely being unused (which typically comes with the requirement that it be set to zero. Thus it is safe to contain arbitrary bytes.

Finally, three more bytes are saved by overlapping the code with the load address.

breadbox

Posted 2012-05-01T20:21:10.453

Reputation: 6 893

1Direct from the source ;) The xchg/jmp is particularly clever. – primo – 2012-05-05T08:41:43.460

to make it work on amd64, I've added BITS 32 at the very top. It doesn't change the byte count in the executable. – jfs – 2016-05-16T20:41:08.977

9

Looking at link [2] A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux, by appending the literal string 'Hello, world!' to the end (instead of letting the last 7 bytes auto fill with zeros), you should be able to squeeze it to 59 bytes, assuming your print logic isn't longer than 8 bytes.

In fact, the user 'breadbox', who I assume to be the writer of the post, has posted a 57 byte version on anarchy golf. It might be useful to look at some of his other post mortems: they seem to be all ELF format.

EDIT: 61 bytes

Using breadbox's nested header approach, I was able to produce this 61 byte solution:

BITS 32
        org     0x05000000
        db      0x7F, "ELF"
        dd      1
        dd      0
        dd      $$
        dw      2
        dw      3
        dd      0x0500001B
        dd      0x0500001B
        dd      4
        mov     dl, 12
        mov     ecx, msg
        int     0x80
        db      0x25
        dw      0x20
        dw      0x01
        inc     eax
        int     0x80
msg     db      'Hello world', 10

Which assembles to:

00000000  7f 45 4c 46 01 00 00 00  00 00 00 00 00 00 00 05  |.ELF............|
00000010  02 00 03 00 1b 00 00 05  1b 00 00 05 04 00 00 00  |................|
00000020  b2 0c b9 31 00 00 05 cd  80 25 20 00 01 00 40 cd  |...1.....% ...@.|
00000030  80 48 65 6c 6c 6f 20 77  6f 72 6c 64 0a           |.Hello world.|

The code begins at 0x001B:

0540000000      add     eax, 0x00000004    ;sys_write
B20C            mov     dl,  12            ;message length
B92E000005      mov     ecx, msg           ;message pointer
CD80            int     0x80               ;syscall
2520000100      and     eax, 0x00010020    ;clears eax
40              inc     eax                ;sys_exit
CD80            int     0x80               ;syscall

The 32-bit add is necessary (instead of mov al, 4 for example), because it's also used as the program header offset, and needs to be exactly 4 (where the program header starts in the above code). The and later on is used to clear eax, because 0x20 and 0x01 are unavoidable header values.

EDIT: 111 bytes (with proper headers)

If, on the other hand, you'd rather your headers adhere to the ELF specifications (even if your kernel ignores them), you could use this instead:

BITS 32
            org     0x08048000
ehdr:                                               ; Elf32_Ehdr
            db      0x7F, "ELF", 1, 1, 1, 0         ;   e_ident
    times 8 db      0
            dw      2                               ;   e_type
            dw      3                               ;   e_machine
            dd      1                               ;   e_version
            dd      _start                          ;   e_entry
            dd      phdr - $$                       ;   e_phoff
            dd      0                               ;   e_shoff
            dd      0                               ;   e_flags
            dw      ehdrsize                        ;   e_ehsize
            dw      phdrsize                        ;   e_phentsize
            dw      1                               ;   e_phnum
            dw      0                               ;   e_shentsize
            dw      0                               ;   e_shnum
            dw      0                               ;   e_shstrndx
ehdrsize    equ     $ - ehdr

phdr:                                               ; Elf32_Phdr
            dd      1                               ;   p_type
            dd      0                               ;   p_offset
            dd      $$                              ;   p_vaddr
            dd      $$                              ;   p_paddr
            dd      filesize                        ;   p_filesz
            dd      filesize                        ;   p_memsz
            dd      5                               ;   p_flags
            dd      0x1000                          ;   p_align
phdrsize    equ     $ - phdr

_start:     mov     al, 4
            mov     dl, msg_len
            mov     ecx, msg
            int     0x80
            mov     al, 1
            int     0x80

msg         db      'Hello world', 10
msg_len     equ     $ - msg

filesize    equ     $ - $$

It's tempting to overlap the last 8 bytes of the ehdr with the first 8 bytes of the phdr, since they are identical (and therefore, all headers would be correct), but in the spirit of being proper, I decided not to.

This assembles to a 111 byte solution:

00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  02 00 03 00 01 00 00 00  54 80 04 08 34 00 00 00  |........T...4...|
00000020  00 00 00 00 00 00 00 00  34 00 20 00 01 00 00 00  |........4. .....|
00000030  00 00 00 00 01 00 00 00  00 00 00 00 00 80 04 08  |................|
00000040  00 80 04 08 6f 00 00 00  6f 00 00 00 05 00 00 00  |....o...o.......|
00000050  00 10 00 00 b0 04 b2 0c  b9 63 80 04 08 cd 80 b0  |.........c......|
00000060  01 cd 80 48 65 6c 6c 6f  20 77 6f 72 6c 64 0a     |...Hello world.|

primo

Posted 2012-05-01T20:21:10.453

Reputation: 30 891

That tutorial says "half of the values in this file violate some part of the ELF standard", whereas Adam says his is "the minimal possible ELF without doing any sneaky trickery from that article". Since the tutorial is cited by the OP, it looks like I misread the present question as requiring adherence to the ELF standard. OTOH, the OP says in a comment that it's "about limits of ELF", so I'm not sure. – r.e.s. – 2012-05-04T11:51:57.473

On my x86-64 machine running 32-bit Ubuntu, the 60-byte executable outputs "Hello, world!\n", then causes a "Segmentation fault" message. (Neverminding the run-time error, the OP requires just "Hello world\n" -- saving you two more bytes.) – r.e.s. – 2012-05-04T14:06:55.963

This was the result of a missing system exit call, and I've corrected for this at the cost of 3 bytes. I've also added a 111 'clean' version, in which all of the headers are set correctly. – primo – 2012-05-04T16:42:33.607

2I haven't actually read the ELF standard, but I'd be surprised if it explicitly said that the headers can't overlap (as long as all the values are still set according to spec). So I'd guess the 8 byte saving ought to be fine. – Ilmari Karonen – 2012-05-04T21:44:12.383

The 61-byte solution given here is largely equivalent to the 62-byte solution I have on my page, the extra char being the comma that's missing from this version. The 57-byte version only works on certain versions of the 2.4 kernel; by loading to address zero, it allows for greater interspersing of code inside the header fields. (See the above web page for a link and more details.)

– breadbox – 2012-05-07T08:48:21.797

I did specifically state that I was basing my result on your tutorial. Had I known of the link you just posted, I would have posted that instead. – primo – 2012-05-07T09:13:26.253

Indeed you did. I just want to make it perfectly clear that the 57-character version is NOT original with me; the original author indicated on my web page (whoever they are) deserve all the credit. – breadbox – 2012-05-07T09:38:03.980

@primo : and for armel ? – user2284570 – 2017-03-05T09:47:01.300