Is it dangerous to compile arbitrary C?

Question

I have a small server and I would like to check compile times on C programs provided by users. The programs would never be run only compiled.

What risks are there to allowing users to compile arbitrary C using gcc 5.4.0?

I'd say this is not the best idea. Compiler vulnerabilities aren't that common due to the fact that most people compile trusted code which do not trigger such vulns. I'm sure there are quite a bit of such vulnerabilities left to exploit. If you absolutely must do this, I suggest compiling the code in throwaway VMs. — André Borie, Oct 06 '16 at 02:07
There are multiple web services offering exactly this. E.g. over on Stack Overflow, ideone.com is popular, as is godbolt.org. The danger seems manageable. — MSalters, Oct 06 '16 at 06:30
@MSalters: There's also coliru.stacked-crooked.com, and I think Stacked Crooked described how he secured his online compiler in depth. — Matthieu M., Oct 06 '16 at 07:01
Weren't there whole IOCCC entries built around this? Certainly if you compile arbitary C++ people can waste a *lot* of your CPU from a tiny program. I think the winning entry in an "exploding error messages" competition generated 55MB of errors from 1kB of input. — pjc50, Oct 06 '16 at 10:28
No time to write this up right now but check this out: https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf — AstroDan, Oct 06 '16 at 12:54
It's a manageable risk with current linux technology. Deploy the source to a docker container, run the containerised compiler and if it doesn't terminate or exceeds output quota then simply blow the container away, log the fail and have fail2ban pick up on repeated problem users. — Andy Brown, Oct 06 '16 at 15:17
A lot of people are advising against this, but there are a number of free online sites that *do* compile arbitrary code, like ideone as demonstrated in a comment. It makes you wonder how they secure themselves. — jpmc26, Oct 06 '16 at 23:38
@jpmc26 - "makes you wonder how they secure themselves" See the comment above by Matthieu M. about `coliru.stacked-crooked.com`. — Kevin Fegan, Oct 08 '16 at 04:21

score 97 · Accepted Answer · edited Oct 06 '16 at 08:45

97

A bit of a weird one, but: it's a denial-of-service risk, or potential information disclosure.

Because C's preprocessor will cheerfully include any file specified in an #include directive, somebody can #include "../../../../../../../../../../dev/zero" and the preprocessor will try to read to the end of /dev/zero (good luck).

Similarly, especially if you let people see the output of their compilation attempts, somebody could try including various files that may or may not be present on your system, and could learn things about your machine. Combined with clever usage of #pragma poison, they might even learn things about the file contents even if you don't provide full error messages.

Relatedly, pragmas can alter a lot of preprocessor, compiler, or linker behavior, and are specified in source files. There's probably not one that lets somebody do something like specify the output file name or something like that, but if there is, it could be abused to override sensitive files, or get itself executed (by writing into cron or similar). There might be something similarly dangerous. You really should be careful about compiling untrusted code.

edited Oct 06 '16 at 08:45

user

152
9

answered Oct 06 '16 at 03:54

CBHacking

40,303
3
74
98

53

http://ideone.com/c2UhRl – Oct 06 '16 at 08:34
4

@Ville-ValtteriTiittanen Add some explanation, not just links. – Tor Klingberg Oct 06 '16 at 09:39
26

@TorKlingberg The explanation is in the question. He just provided a working implementation of the second paragraph ideea – bolov Oct 06 '16 at 09:55
1

How well are these attacks mitigated through use of a properly-configured chroot environment? There's no need to have `cron` or a meaningful (elsewhere) `/etc/passwd`, so that reduces the attack surface considerably. (It's pretty easy these days to create a minimal chroot with `cdebootstrap` and `schroot` - that's how I create compilation environments for old distros I need to support.) – Toby Speight Oct 06 '16 at 10:27
Forgot to say, also use `ulimit` to help against some amount of DoS. – Toby Speight Oct 06 '16 at 10:27
@CBHacking. +1. Can you please explain how #pragma poison can be used to learn things about file contents? – pri Oct 06 '16 at 10:45
5

@TobySpeight: chroot works well against this type of attack. For example, attempting to compile @Ville's file on [Coliru](http://coliru.stacked-crooked.com/) simply gives an error of: `main.cpp:1:50: fatal error: ../../../../../../../../../../dev/zero: No such file or directory` (and similar for the file he links in the comment to Colin Cassidy's answer). – Jerry Coffin Oct 06 '16 at 16:15
@Ville-ValtteriTiittanen: Yikes. – Robert Harvey Oct 06 '16 at 18:21
@PriyankGupta: In theory, poisoning certain tokens that might be in a file, and then including the file, will fail quickly if those tokens are present. This can provide a side-channel (timing) attack to reveal some information about a file's contents even if you (the attacker) don't get the actual output of the compiler. The file would still need to be valid C, or at least close to it, or the parser would fail before the poison check applied. You'd also be picking file names blind. Not an easy attack, but possibly a meaningful one in some circumstances. – CBHacking Oct 06 '16 at 19:10
@Ville-ValtteriTiittanen Are you trying to demonstrate that this *is* a problem or *is not* a problem? Doesn't seem to have caused any harm to ideone. – jpmc26 Oct 06 '16 at 23:43
Many linux utilities read files by `open`/`fstat`/`mmap` so reading zero bytes files or "files" without length wouldn't work. They can't read `/proc` fake "regular files" either. I have been beaten by that when some programs insisted on having a regular `/etc/fstab` "regular file" and not a link/mount of `/proc/mount`. – curiousguy Jun 20 '18 at 21:45

score 43 · Answer 2 · edited Jun 16 '20 at 09:49

Compiler bombs

C is a very powerful language, and some of the terrible things you can do with it would shock you. For example, you can create a 16 byte C program that takes 27 minutes to compile, and when it finally finishes, it compiles to a 16 Gigabyte executable file. And that's only using 16 bytes. When you factor in the preprocessor and larger source code files, I'm sure you could create much larger compiler bombs.

This means anyone with access to your server could effectively do a DoS attack on your server. Now to be fair, this is significantly less dangerous than having someone abuse a vulnerability in the compiler, or including sensitive files to get information about your server (like the other answerers talked about).

But it's still another possible annoyance that you'll encounter when compiling arbitrary code. I'm sure you could setup a time-limit on all builds, and make sure to never store the binary files. Although of course, you still need to keep it on disk while it's being created, so if someone hypothetically made a compiler bomb larger than your hard drive, you'd be in trouble (if you let the build finish).

Funnily enough, the reason I ask is because I wanted to make a PPCG challenge to make a compiler bomb, and I wanted to set up a scoring server. — Sriotchilism O'Zaic, Oct 06 '16 at 14:12

score 29 · Answer 3 · edited Oct 06 '16 at 10:54

@AndréBorie is correct. Compilers and the corresponding configuration will not be well vetted for security issues, so generally speaking you should not compile untrusted code.

The risk is that a buffer overflow or some type of library execution vulnerability is exploited, and the attacker gains access to the (hopefully non-root!) user account that ran the compiler. Even a non-root hack is serious in most cases. This could be elaborated on in a separate question.

Creating a VM is a good solution, to contain any potential exploits so they cannot harm the rest of your application.

It is best to have a template Linux VM you can launch as needed with a clean slate compiler environment.

Ideally you would throw it away after every use, but this may not be strictly necessary. If you isolate the VM well enough, and properly sanitize response data from the VM, which you should be doing anyway; then the worst a hack could do is DoS or create false compile times. These are not serious issues on their own; at least not nearly as serious as accessing the rest of your application.

However, resetting the VM after every use (i.e. instead of daily) does provide for a more stable environment overall and can improve security in certain edge cases.

Some OSes provide Containers as an alternative to VMs. This may be a leaner approach, but the same principles apply.

Filesystem snapshots (e.g. ZFS) + a container (so no kernel startup times) and you could probably clean up between requests within 5 seconds. And if you want multiple containers you could even use CoW clones. — Bob, Oct 06 '16 at 14:15
That sounds good. I assume you mean an actual reset (not needing kernel startup), so that malicious daemons are destroyed. — 700 Software, Oct 06 '16 at 15:03
Most containerisation solutions implement some 'stop' operation that will kill all processes started in the container. I believe LXD (`stop --force`) does this by putting them all in a cgroup. But since creating a CoW clone from a snapshot in ZFS is cheap, you could even create a whole new container before the old one finishes terminating. Most of the time would probably be spent starting init inside the container. — Bob, Oct 06 '16 at 16:33
Make sure to run the VMs on a separate server to not provoke timing attacks and other side-channel attacks on your webserver! — MauganRa, Oct 12 '16 at 19:11

Matt Godbolt · Answer 4 · 2016-12-10T20:25:35.587

Yes, it's dangerous: but as people have said it's possible to do. I'm the author and maintainer of the online compilers at https://gcc.godbolt.org/, and I've found it pretty workable to make it safe using a combination of:

The whole site runs on a VM instance with little permissions to do anything. The networking is severely limited with only port 80 visible, and ssh enabled only from whitelisted IPs (my own).
Each compiling instance runs within a throwaway Docker container with even less permission
The compiler is executed from a script that sets all the process limits (memory, CPU time, etc) to low limits to prevent code bombs.
The compiler is run with a LD_PRELOAD wrapper (source here) which prevents the compiler from opening any files not on an explicit whitelist. This prevents it from reading /etc/passwd or other such stuff (not that that'd help all that much).
As a nicety I parse the command-line options and don't execute the compiler if there's anything particularly suspect. This isn't a real protection; just a way to give a "seriously, don't try this" error message instead of the LD_PRELOAD catching bad behaviour.

The whole source is on GitHub, as is the source to the docker container images and compilers and such.

I wrote a blog post explaining how the whole setup is run too.

The `LD_PRELOAD` technique is next to useless. It's absolutely trivial to bypass once you've gained code execution via a compiler vulnerability. It might help for confused deputy issues, but not much else... Also, even if it did work, you missed quite a lot of functions that can be used to open files instead. — forest, Dec 28 '18 at 07:09

score 12 · Answer 5 · answered Oct 06 '16 at 08:20

12

You would not want to be running the compiler as root, though I have seen this happen for "ease and convenience" reasons. It would be all too easy for an attacker to include something like:

#include "../../../../etc/passwd"
#include "../../../../etc/shadow"

and get the contents of these files back as part of the compiler error message.

Also compilers are programs like everything else, and will have their bugs that could be vulnerable, it would be all to easy for someone to just fuzz C programs and cause problems.

Most application security will focus first and foremost on input validation, unfortunately defining 'safe and valid' input for a C compiler is probably up there with the halting problem in terms of difficulty :)

answered Oct 06 '16 at 08:20

Colin Cassidy

1,880
11
19

19

http://ideone.com/ZzPHMw http://ideone.com/xc4s5a – Oct 06 '16 at 08:37
my point exactly, there are others out there that are susceptable to the etc/shadow one – Colin Cassidy Oct 06 '16 at 08:58
3

But the error messages will be sent to *your* terminal, and you could just _cat /etc/passwd_ anyway. – ThoriumBR Oct 06 '16 at 18:00
yes I want the error messages to come back to me, I'm looking to get the password hashes. As for cat /etc/passwd, that depends on the setup, maybe the C code is being submitted to a separate server, in which case now I have the hashes for the remote system that I can start offline cracking. – Colin Cassidy Oct 07 '16 at 09:46

score 3 · Answer 6 · answered Oct 06 '16 at 23:13

If you allow an user to provide an archive containing the code you can have issues, not exactly with the compiler but the linker it uses ;)

ld follows symbolic links if they point to a file that do not exist. What it means is that if you compile test.c to the output a.out but already have a symbolic link named a.out in your directory pointing to a non-existing file then the compiled executable will be written at the location pointing to the file (with the limitation of user rights).

In practice an attacker could, for example, include a string containing a public ssh key in his code and provide a symbolic link named a.out to ~/.ssh/authorized_keys. If that file does not already exist this allows the attacker to plant his ssh key in the target machine allowing him external access without having to crack any password.

Is it dangerous to compile arbitrary C?

6 Answers6

Compiler bombs