I have a small server and I would like to check compile times on C programs provided by users. The programs would never be run only compiled.
What risks are there to allowing users to compile arbitrary C using gcc 5.4.0?
I have a small server and I would like to check compile times on C programs provided by users. The programs would never be run only compiled.
What risks are there to allowing users to compile arbitrary C using gcc 5.4.0?
A bit of a weird one, but: it's a denial-of-service risk, or potential information disclosure.
Because C's preprocessor will cheerfully include any file specified in an #include
directive, somebody can #include "../../../../../../../../../../dev/zero"
and the preprocessor will try to read to the end of /dev/zero
(good luck).
Similarly, especially if you let people see the output of their compilation attempts, somebody could try including various files that may or may not be present on your system, and could learn things about your machine. Combined with clever usage of #pragma poison
, they might even learn things about the file contents even if you don't provide full error messages.
Relatedly, pragmas can alter a lot of preprocessor, compiler, or linker behavior, and are specified in source files. There's probably not one that lets somebody do something like specify the output file name or something like that, but if there is, it could be abused to override sensitive files, or get itself executed (by writing into cron or similar). There might be something similarly dangerous. You really should be careful about compiling untrusted code.
C is a very powerful language, and some of the terrible things you can do with it would shock you. For example, you can create a 16 byte C program that takes 27 minutes to compile, and when it finally finishes, it compiles to a 16 Gigabyte executable file. And that's only using 16 bytes. When you factor in the preprocessor and larger source code files, I'm sure you could create much larger compiler bombs.
This means anyone with access to your server could effectively do a DoS attack on your server. Now to be fair, this is significantly less dangerous than having someone abuse a vulnerability in the compiler, or including sensitive files to get information about your server (like the other answerers talked about).
But it's still another possible annoyance that you'll encounter when compiling arbitrary code. I'm sure you could setup a time-limit on all builds, and make sure to never store the binary files. Although of course, you still need to keep it on disk while it's being created, so if someone hypothetically made a compiler bomb larger than your hard drive, you'd be in trouble (if you let the build finish).
@AndréBorie is correct. Compilers and the corresponding configuration will not be well vetted for security issues, so generally speaking you should not compile untrusted code.
The risk is that a buffer overflow or some type of library execution vulnerability is exploited, and the attacker gains access to the (hopefully non-root
!) user account that ran the compiler. Even a non-root
hack is serious in most cases. This could be elaborated on in a separate question.
Creating a VM is a good solution, to contain any potential exploits so they cannot harm the rest of your application.
It is best to have a template Linux VM you can launch as needed with a clean slate compiler environment.
Ideally you would throw it away after every use, but this may not be strictly necessary. If you isolate the VM well enough, and properly sanitize response data from the VM, which you should be doing anyway; then the worst a hack could do is DoS or create false compile times. These are not serious issues on their own; at least not nearly as serious as accessing the rest of your application.
However, resetting the VM after every use (i.e. instead of daily) does provide for a more stable environment overall and can improve security in certain edge cases.
Some OSes provide Containers as an alternative to VMs. This may be a leaner approach, but the same principles apply.
Yes, it's dangerous: but as people have said it's possible to do. I'm the author and maintainer of the online compilers at https://gcc.godbolt.org/, and I've found it pretty workable to make it safe using a combination of:
LD_PRELOAD
wrapper (source here) which prevents the compiler from opening any files not on an explicit whitelist. This prevents it from reading /etc/passwd or other such stuff (not that that'd help all that much).LD_PRELOAD
catching bad behaviour.The whole source is on GitHub, as is the source to the docker container images and compilers and such.
I wrote a blog post explaining how the whole setup is run too.
You would not want to be running the compiler as root, though I have seen this happen for "ease and convenience" reasons. It would be all too easy for an attacker to include something like:
#include "../../../../etc/passwd"
#include "../../../../etc/shadow"
and get the contents of these files back as part of the compiler error message.
Also compilers are programs like everything else, and will have their bugs that could be vulnerable, it would be all to easy for someone to just fuzz C programs and cause problems.
Most application security will focus first and foremost on input validation, unfortunately defining 'safe and valid' input for a C compiler is probably up there with the halting problem in terms of difficulty :)
If you allow an user to provide an archive containing the code you can have issues, not exactly with the compiler but the linker it uses ;)
ld follows symbolic links if they point to a file that do not exist. What it means is that if you compile test.c to the output a.out but already have a symbolic link named a.out in your directory pointing to a non-existing file then the compiled executable will be written at the location pointing to the file (with the limitation of user rights).
In practice an attacker could, for example, include a string containing a public ssh key in his code and provide a symbolic link named a.out to ~/.ssh/authorized_keys. If that file does not already exist this allows the attacker to plant his ssh key in the target machine allowing him external access without having to crack any password.