In order to make C code "safe", even against a malicious developer, then you have to fix the core issues of C, which have all been repeatedly exploited for arbitrary code execution (exactly what you want to avoid):
Weak types. In C, you can take a bunch of bytes, and interpret them as a pointer, an integer, or whatever. All it takes is a cast. As long as such casts are possible, then the language won't be safe. You need to enforce strict unescapable types.
Unchecked array accesses. C gives you pointers and from these you can read and write arbitrary bytes in RAM, with no control. From these follow buffer overflows, which can lead to security holes. In your new language, you must enforce strict array bound checks.
Manual memory management. In C you dynamically allocate memory with malloc()
, and then release it with free()
. Nothing prevents you from trying to access a memory block after having freed it, or freeing it twice (these are the classic use-after-free and double-free, both leading to serious consequences). Lack of free()
can also imply memory leaks, which can be bothersome. In your new language, you must do something to prevent these memory management issues, and that is called automatic memory management, which is most often done with a garbage collector.
Down this road you get Java. Or C#, if you prefer. This works, but it is a considerable development effort. Moreover, the systematic checks on array bounds can imply a non-negligible overhead. For purely computational tasks with no I/O bottleneck (no disk access, no network access, and all data fits in the CPU L1 cache), I typically get a factor of 2 to 4 between optimized C and optimized Java (tested on numerous cryptographic algorithms such as hash functions; also on zlib-compatible compression). For some algorithms the slowdown was smaller (Java code was achieving up to 70% of the speed of C code), but that's rare. For the specific task of computations with big integers, Java is more like 6 times slower than C+assembly, because of the lack of support for the 64x64->128 multiplication opcode.
There are two main roads out of this (assuming that the overhead of a Java-like language is not acceptable to you). The first one is to try to use formal verification. The concept is that the code will be executed only if it can be proven that it "behaves properly". In essence, Java is doing a weak form of verification, in that the JVM proves that a piece of code does not do anything wrong like using a reference with a wrong type; but that verification relies on systematic checks, e.g. on casts and array accesses, and also the presence of the GC. Formal verification is an active research area, whose scope extends beyond security: basically, this is about proving correctness, i.e. absence of any bug.
There are good mathematical reasons why a generic verifier which works on all possible programs cannot exist. Researcher still hope to improve things in several directions:
There are programs for which proofs are feasible. For instance, a program with only if
but no loop, and no recursive call, cannot loop "forever": that's an example of an easy proof of correctness for programs which follow some specific rules. Researchers try to define more useful sets of rules which still allow for proofs.
The human developer may provide proof elements. There are constructions for which no algorithm is known to generate the proof, but a developer-provided proof can still be verified. Assertions fall in that category.
We may find optimizations: starting from a generic "safe" language like Java, an automatic prover might find "occasions", e.g. code chunks where it can be proven that some array accesses are necessarily fine, thus allowing for removal of the corresponding checks. In fact, modern JVM apply such strategies, which is why they are not as slow as could be feared. Many researchers believe that there still is room for improvements.
However, right now, don't hope for too much: existing top-of-the-line Java and .NET virtual machines are about the best that can be done for now.
The other method is to surrender the CPU to the potentially hostile code, and instead move the security perimeter at an upper layer. The "plugin" is executed in an address space of its own; the MMU blocks unwanted accesses. The plugin code can do all the memory accesses that it wishes: it will see only its own code and data. If the code tries system calls, then the kernel knows not to honour them.
Among incarnations of this method are FreeBSD's jails and Linux's seccomp. The same concept at a still higher level is virtualization with hypervisors: the guest can be a full operating systems, with access to disks, network... and yet all this hardware is virtual and the guest cannot escape from the closely guarded perimeter set by the hypervisor.
The main problem with that kind of isolation is overhead for data exchanges. If your "plugin" must consume a lot of data from the application into which it is plugged, or if it produces a lot of data for that application (typical example: video codecs), then the moving of data between the two worlds, beyond the MMU/hypervisor barrier, can kill performance.