22

It seems like there is no practical way to verify the full integrity path of precompiled and packaged software? I can check the downloaded package itself by hashes, but I have no verification if the compiled binaries really represent the public source code?

Is there not even a theoretical solution for this problem? In the best case a way that could be automated?

Maybe decompile it and compare the output or hashes of it with something the software provider offers?

flori
  • 381
  • 2
  • 8
  • 1
    I found this nice post: http://blogs.kde.org/2013/06/19/really-source-code-software So maybe if the distributors take more care for deterministic compilation, once there could be a crowd sourced mechanism maybe? – flori Jul 05 '13 at 16:18

2 Answers2

21

Compilation is a mostly one-way operation, and it is not deterministic, at least not in a robust way.

You could recompile the source code and see if it yields the same binary. However, the exact binary can vary depending on a lot of parameters, including the compilation options and the exact version of the used compiler. Moreover, some compilers embed some "comments" in binary files, comments which usually include the compiler version but also may include the "build number" (if such a number is maintained) and, possibly, the build date and time -- in that case, you will not get the same binary, not down to the last byte. If you want to see if you got the "same" binary, you may thus have to first strip them of such comments (the Unix strip command may be useful).

Strictly speaking, compilation could be randomized; since generating optimal code is a hard problem, some compilers employ randomized algorithms which, heuristically, are good on average. Such a compiler could generate a distinct binary each time. Since such behaviour makes debugging much harder, many compilers who indulge in heuristic algorithms will still try to be reproducible (i.e. they will get their randomness from a PRNG seeded with a specific, configurable value).


There is a much simpler solution: if you have the source code and can recompile it, then just use the output of your recompilation.

Of course, this does not completely solves the problem of trust; it just moves it around. When compiling from source:

  • you have to trust that the source code does not contain backdoors;
  • you have to trust the compiler itself for not playing nasty tricks on you.

At least, source code is nominally readable by humans (that's the point of source code), so you could perform some analysis of the code by reading it (or having it read by some specialist that you trust). There is no known way to make sure that a given piece of code does not contain any backdoor or vulnerability (otherwise, this would mean that we known how to produce bug-free code); however, it is much harder to conceal a backdoor in source code than in a compiled binary.

As for the compiler, see this very classic article.

Gilles 'SO- stop being evil'
  • 50,912
  • 13
  • 120
  • 179
Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • 3
    The "simpler solution" has the problem that compiling from source might help *you*, but it won't catch an attacker who uses the pre-compiled binaries infecting the other 99% of users. – CodesInChaos Jul 05 '13 at 13:22
  • 6
    For understanding how difficult analysis of source code is, see the [Underhanded C Competition](http://underhanded.xcott.com/). – Ladadadada Jul 05 '13 at 13:40
  • A small addition to this nice answer: In practice it is not so difficult to circumvent trusting binaries. At least for Linux there are quite a few distributions available that are completely based on source code, e.g. Gentoo and Linux from Scratch. Of course all other issues remain as you pointed out. – Alexander Jul 05 '13 at 15:52
  • @CodesInChaos That is why I like to have an automation. – flori Jul 05 '13 at 15:54
3

The concept of reproducible builds seems to offer a solution for this problem. At least a theoretical one.

It means that every run of a build (or compilation) process should return the identical output, given that the input source was the same.

With it, every newly published binary could be cross-checked by me or others if it really represents the source code it claims to represent.

However there are only a few projects (in February 2017) that implemented this concept in their build processes already (mainly operating systems). So in most cases this solution is still a theoretical one.

flori
  • 381
  • 2
  • 8
  • 2
    It's not actually that theoretical, [the majority of Debian packages are reproducible build](https://wiki.debian.org/ReproducibleBuilds). – Lie Ryan Feb 26 '17 at 10:39