18

This may be a stupid question, but...

Why are security-crucial software written in languages such as C and C++? I understand why, say, an embedded system might need a low-level language to make the most use of limited resources, but it seems foolish to write security software in low-level languages.

I'm asking this because whenever I go to debian.org and look at the latest security fixes, the vast majority of them involve memory safety issues, which only appear in unsafe languages such as C and C++. As bad rep as Java gets, for example, I would imagine 90% of the security patches to OpenSSL would be completely useless. If an even higher-level language like Scala or Lisp were used, I would suppose it would be even easier to get things secure. Messing up your arrays will in the worst case lead to a runtime error.

Is the reason for using C/C++ avoiding side channel attacks? I might imagine some properties of a key interfering with the execution of a conservative garbage collector (like Boehm) and leading to timing attacks, but no higher-level language uses unsafe garbage collectors anyway.

ithisa
  • 566
  • 4
  • 11
  • 4
    There are no safe languages. C/C++ may be less safe, but it's only a relative difference, not a qualitative one. Software vulnerabilities come from bad design and flawed implementation of good designs, and no language removes that. – ddyer Oct 19 '13 at 19:32
  • 2
    In C/C++, there are lots of things that *usually* work but present a security hole (gets for example). In other languages they only present an crash risk due to an unhandled exception (out of range). – ithisa Oct 19 '13 at 23:25
  • 5
    @ddyer For remote code execution vulnerabilities, the difference between c and and java is *huge*. Plain java can't contain use-after-free or access to out-of-bounds array elements. That kind of vulnerability is severe and extremely common in c. – CodesInChaos Oct 21 '13 at 12:41
  • @everyone I agree completely that C/C++ is less safe than almost any alternative, but the question was cast as safe verses unsafe. Nothing is safe. – ddyer Oct 21 '13 at 17:51
  • Many non-C languages provide memory safety, and some provide type safety. That doesn't mean an app written in them is secure, but it can be free from certain classes of errors (say, buffer overflows, use-after-free...). – Blaisorblade Nov 09 '14 at 20:22

7 Answers7

15

Most software is written in languages that the developer knows how to use, and for security-related software, that's not a bad thing -- provided that the developer actually knows his language of choice, down to the fine details, and I may argue that this is not the case with C and C++ (a vast majority of developers who believe they know C or C++ are actually wrong).

On Unix-like systems, in particular Linux (where a substantial part of all open-source development occurs), C is "the" system language. Unix is very C-friendly (all low-level API are C-based, described with C headers; system libraries are written in C, so the C compiler is well tested and well integrated). This has resulted in a large C-based ecosystem where developers use C because they know C and want to use libraries which offer a C-based API. This has nice portability benefits (for instance, no problem in running OpenSSL on a PowerPC or a Mips -- try doing that with Java !).

Using another language entails some issues: lack of runtime support, less portability, an assumed lack of performance... ("Safe" languages can be efficient, but some widespread implementations will be either slow because they use interpreters with no JIT or will be memory hogs; and also many people just dismiss these languages as inherently slow because they did not actually tried, and they think in slogans).

This leads to my conclusion: people use C and C++ for security-critical software out of tradition.

(I write all that as someone who wrote his own Java VM to run a SSL server written entirely in Java, on a small 50 MHz PowerPC system with 16 MB of RAM -- it was a Certification Authority, and it could serve 70 clients simultaneously. So when I say that a "safe" language like Java can be used to run security-critical software on low-power systems and with decent performance, I mean it.)

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • I assume you added some native math support to make up for Java's deficiencies in that area? – supercat Apr 14 '14 at 22:42
  • The hardware was a HSM, it had a cryptographic accelerator for RSA. "Native math" would not have been sufficient, by far. Note that the symmetric cryptography of the SSL connections was pure Java. – Tom Leek Apr 15 '14 at 10:51
  • By "native" I meant "using machine features rather than JVM bytecode". Code written for the x86 or ARM instruction set could do RSA math far more efficiently than code written in Java bytecode; add hardware support and it could presumably go even faster. – supercat Apr 15 '14 at 12:35
  • In my case, the JVM (that I wrote my self) was of the AOT kind: translation of bytecode (not Java source) to C code, compilation with a C compiler. Compared to native code for computing intensive tasks (e.g. encryption), the slowdown is a factor of 2 to 4. For big integers too -- at least on 32-bit ARM and PowerPC; on 64-bit x86, the factor rises to about 6 due to the 64x64->128 multiplication opcode that cannot be used from Java (whose biggest integer type is 64 bits). – Tom Leek Apr 15 '14 at 12:51
  • I wonder why so few languages include any support for extended-precision math? It's not hugely common, but if A and B are extended-precision values stored in little-endian arrays and n is a number, computing A+=N*b can be done much more efficiently in machine code than in high-level languages. Though on many platforms, efficiency could be improved further if rather than computing x*y->h:l, the CPU had an instruction to compute h+x*y->h:l. – supercat Apr 15 '14 at 13:08
  • Big integers are not easy to use in the absence of automatic memory management (the GC). Many people also fear that making the default "int" type a big integer would incur intolerable overhead (which is mostly untrue in practice, _if_ you do it properly, i.e. encoding small integers as sort-of pointers). Python, and some Scheme dialects, use big integers by default. – Tom Leek Apr 15 '14 at 13:31
  • I wasn't thinking of "big integer" types so much as built-in functions or intrinsics to compute X*Y->H:L, H:L/D->Q:R, add a range of one array to a range of another with carry propagation, etc. Basically, the tools needed to implement big-integer types efficiently. If a processor includes a single-cycle 64x64->128-bit multiply, it seems silly to split such operations into a bunch of shifts, masks, and 32x32->64-bit multiply operations. Likewise if a processor can use add-with-carry instead of conditional logic when adding parallel arrays. – supercat Apr 15 '14 at 14:51
  • Note that you can compile C with a proven correct (formally verified) compiler which ensures that any bugs in the program will be a result of you introducing them, not a result of the interpreter or runtime screwing things up. After all, a _perfect_ C program (e.g. seL4) cannot have bugs, but even a _perfect_ Python program can have problems if the interpreter is bad. – forest Mar 19 '18 at 02:49
6

You can write insecure software in any language. C and C++ might make it easy to make a critical mistake, especially to inexperienced programmers...

but

... even as the most experienced and careful programmer you would have no control over a security problem that resides within the complexity of a high level language.

With C and C++ you got a lot of control about what is happening within your code. The compilers can be relatively simple and are often well reviewed. This means, any mistake you make is probably your own. In high level languages you often have to rely on the security of the API and language features you are using, this leads to a more complex system which is harder to audit.

The code basis within which a security critical mistake can lie, is ultimately more manageble than in a high level language.

Paul Hänsch
  • 171
  • 4
  • I don't like this answer. With large projects, compiler code base complexity is relatively small and so everyone wins from using a more advanced language. – Display Name May 22 '14 at 15:15
  • Most security-sensitive software is (or should be) smaller than a production-quality optimizing compiler, especially if written in a high-level language. For instance, [this SSL/TLS implementation in Haskell](https://github.com/vincenthz/hs-tls) has 7000 lines, while Haskell compilers such as GHC are much bigger. However, while the claim is true, most production quality C are even bigger and insidious (when using optimizations) — even C compilers are too complex. See http://blog.regehr.org/archives/1180 for a discussion of some aspects from somebody who breaks C compilers for a living. – Blaisorblade Nov 09 '14 at 20:28
  • this is true, however, security issues can be in any part of a big software project, not just in crypto stuff. – Display Name May 04 '16 at 09:02
3

Actually in some cases, one has to use such languages to be secure as higher level languages don't provide enough hardware access. For example wiping encryption keys from memory after they are nolonger needed. Or making sure multiple code paths are equal length and so take equal amount of time to prevent side channel attacks.

ewanm89
  • 2,043
  • 12
  • 15
  • In that case, shouldn't those "critical" parts be implemented in a low-level language with the highlevel logic implemented in a safer language? The whole-system complexity argument as made by @PaulHänsch still applies in that case, of course. – Simon Lindgren Feb 22 '14 at 12:34
2

This is an excellent question.

If you asked the authors of such software, most would say performance. For some things, such as the Linux kernel, this performance is essential. But for a lot of software, performance is less critical, and this makes it a poor reason. Imagine a web server that was 50% slower, but had never had a security vulnerability. A whole lot of users would be happy to trade performance for security.

Using a managed language doesn't guarantee security. Consider Java or C# - these are memory-safe; it's impossible (apart from VM bugs) to have a buffer overflow vulnerability. But they can have injection flaws, access control weaknesses, etc. However, these kinds of vulnerability are somewhat easier to detect and prevent. One of the particular concerns with non-memory-safe languages (like C/C++) is that it's incredibly difficult to detect subtle memory corruption flaws in a complex application.

If we started today, I think a lot more software would be written in managed languages. But we never start with a blank sheet, and a lot of software is written in C. And the thing is, a lot of software has got the hang of security. Look at Apache for example - it had a lot of serious issues if you go back a decade or so, but has had a much better recent history. With this track record, the motivation to completely rewrite the code in another language is gone.

Bear in mind that while a lot of off-the-shelf software is non-managed, most bespoke software (which includes a huge number of custom web apps) is written in managed languages.

At the moment, the worst place for software security is the desktop, with web browsers and plugins being the worst offenders. Unfortunately, most web browser flaws are associated with sandboxed languages (JavaScript, Java, Flash, etc.) and writing the VM for these in a managed language would be a significant performance hit.

It would be an interesting project to create a suite of computing software that's built from the ground up to be secure. I know OpenBSD have been pushing that mantra for years, but their approach doesn't quite seem right. If someone else was to take this on, using a managed language for nearly everything would make a lot of sense.

paj28
  • 32,736
  • 8
  • 92
  • 130
1

Providing all the libraries are open source and signed, there is no security benefit in using a lower-level language - indeed the risk of coding errors and KLOC size of the business logic to audit decreases the effective security as you correctly posit.

Lower-level languages will not protect against compiler viruses or other subversions of code or binaries where the file signatures have not been signed by a private key kept separate from the server and used only by auditors; and where no firmware on hardware ROM exists to check the signatures.

Coding security applications in C/C++ is a by-product of two things:

  • A hacker ethos from the 1990s to roll your own, as most third-party libraries were closed-source.
  • A desire to improve the speed of often slow security algorithms, despite the inherent time complexity constraints of security algorithm outpacing the O(1) benefit of a lower-level language.
  • An assumption that libraries of a higher-level language can't be pinned to an audit version.
LateralFractal
  • 5,143
  • 18
  • 41
0

The vast majority of bugs come from poor development practices and inexperienced developers, not the language chosen.

A responsible developer will use Test Driven Development to prove that his or her software performs exactly as intended, with no side effects or memory leaks. They will immediately fix any bugs reported. These practices address the same problems that managed languages try to address, but they also produce a robust suite of unit tests proving proper functionality. They also yield modular designs that are easy to understand and maintain, meaning future enhancements can be as safe as the original development.

Beyond this, however, are the business level problems that are at the root of many of the modern security flaws. SQL injection, cross site scripting, error messages that reveal too much information, cleartext passwords, all those kinds of flaws stem from a lack of attention to security specific details. Managed languages have no magical properties that prevent these kinds of flaws.

Developers need to also understand secure coding techniques, such as validating input, enforcing data integrity, ensuring business rules boundaries are enforced, logs are sanitized of sensitive data, etc. OWASP is a great source of information there.

So in addition to following good development practices, software developers need to swallow their egos and leverage external tools and people, too. Static Code Analysis tools like Coverity's Klocwork, HP's Fortify, FindBugs, PMD, FxCop, etc., can all help identify semantic problems like CRSF, cleartext passwords, etc. And coworkers and code reviews are a valuable source of insight as well.

Blaming the language is not productive. If you must blame someone, blame the unprofessional developers, the people who hire the unprofessional developers, the people who provide inadequate resources to do the quality work necessary to produce secure code, or who add process roadblocks that prevent quality work from being done.

John Deters
  • 33,650
  • 3
  • 57
  • 110
  • 3
    IMO blaming the c for buffer overflows is appropriate. – CodesInChaos Oct 21 '13 at 12:51
  • Even professional developers can make fatal mistakes in coding C. I doubt, for example, that the makers of Firefox and Chrome were "unprofessional". On the other hand, it is reasonably easy for an "amateur", who knows a lot about theoretical cryptography but NOT about the nitty-gritties of low-level programming, to write secure software in a high-level language. – ithisa Oct 21 '13 at 14:15
  • @CodesInChaos, it's like anything else in security - defense in depth. All developers, not just the professionals, should be running static code analyzers, which I've found will catch the buffer overflow and off-by-one kinds of errors that C lets you make. Those tools also guard against many kinds of other security mistakes, not just buffer and pointer problems, and are worth running no matter what language you choose. – John Deters Oct 22 '13 at 13:33
  • The language is part of the toolchain, and C requires much more effort in the rest of the toolchain. Also, mentioning test-driven development in the context of security seems inappropriate: insecure applications *do* work. Fuzzing is appropriate, but a different thing and often harder. – Blaisorblade Nov 09 '14 at 20:33
  • @Blaisorblade, That's why my third paragraph begins with the words "Beyond this". Many errors leading to security flaws are simple programming errors, not just overlooked avenues of abuse. TDD helps create a high percentage of code coverage, but I know TDD does not fix "pure" security problems of omission (validation, etc.) It eliminates a class of mistakes that we know leads to flaws that can be exploited. – John Deters Nov 09 '14 at 23:00
  • Everyone fails, everyone. Sooner or later. True security is about having routines to discover it early, to mitigate the damage and to fix it quickly and practically at the customer/in the end product. – Simply G. Apr 21 '16 at 06:52
0

There are only a handful of unsafe languages still in use today; C and C++ being the most obvious and typical options. And as dangerous as it is, there are still a number of perfectly valid reasons for using them:

Interoperability
OpenSSL is a great example. OpenSSL is used by Perl, Python, Ruby, Lua, Node.js, and in practically every other higher-level framework out there. It's even possible to use it in managed frameworks like .NET, though it's less common to do so. If OpenSSL wasn't written in C or C++, this would be essentially impossible. While interoperability between high-level languages can be done, the friction is high and results are typically disappointing.

This remains the primary reason why general-purpose library code is still written in C or C++ (typically C in this case).

Speed
This is less of a valid reason today given hardware speeds and improvements in compilers. But C remains the gold-standard of program speed. No matter how fast you can make any other language go, you simply can't be faster than C. While this is trivially true by definition, there are real-world implications as well. And the additional safety checks that define a "safe" language add a measurable performance penalty.

Size and Complexity
In certain scenarios, such as embedded devices or other constrained environments, the added baggage that comes along with safe languages are a luxury that the environment can't support.

Predictability
In the case of kernel and driver code in particular, higher-level languages, including safe languages, add a level of unpredictability that becomes a liability. The driver author needs to know exactly when memory will be allocated and destroyed, for example. He needs to know that the machine will run exactly the code he writes, and nothing more and nothing less.

Because of the close-to-the-metal nature of OS kernels, safe languages simply aren't an option. And kernel code is the single most security-critical aspect of any computer's execution, meaning that safe coding in an inherently unsafe environment is an unavoidable necessity.

tylerl
  • 82,225
  • 25
  • 148
  • 226
  • "The driver author needs to know exactly when memory will be allocated and destroyed, for example". Hmm. Why exactly? I never got why people are so fussy about timing in low-level systems. Nobody really cares if your screen lags by 50 milliseconds every 20 seconds because of garbage collection? – ithisa Oct 21 '13 at 10:47
  • @user54609 Depends on the application. Even outside hard real-time systems, it can be annoying. In games or videos even short lag spikes can be visible. Or consider a server application that simply doesn't respond to messages for 50ms. – CodesInChaos Oct 21 '13 at 12:49
  • But things like malloc can block for a long time as well. Moreover, common "deterministic" GCs such as C++ refcounting smart pointers do cause large amounts of lag when, say, the root of an enormous tree goes out of scope, and the whole tree must be freed IMMEDIATELY. – ithisa Oct 21 '13 at 14:06
  • @user54609 because the kernel controls memory allocation. Attempting to access or allocate memory outside a very narrowly defined range of conditions will result in a crash -- BSOD on windows, kernel panic on Linux/OSX. Many (most?) kernel crashes are caused by the author unwittingly violating some memory access rule. – tylerl Oct 21 '13 at 18:59
  • Why is that an issue? I can't believe every single GC in the world relies on undefined behavior, and might crash the kernel. – ithisa Oct 21 '13 at 19:55