Searching vulnerabilities via similar code comparison. Is it a viable attack vector?

Question

Consider the following scenario:

the attacker scans (optimally automatically) open codebases (eg. GitHub) for vulnerable code fragments by checking bug reports and patches.
the attacker scans for similar code fragments in other sources. (i.e. other software using the same pattern, but not containing the patch.)
the attacker searches for the identified software and exploits the vulnerabilities.

The question is:

is it a viable attack method?
if it is, has it any name?
are there any existing research considering this issue or any documented example?

We know code duplication (forking, using snippets, even copy pasting) is commonplace in the open source world. The matching could be accomplished by existing methods like simple regex searches or AST (Abstract-syntax trees) analysis. It seems to be an effective attack against OSS or software using OSS (or against any software whose source is obtainable by the attacker), but I cannot find any examples of it. Hence the question.

Someone correct me if I'm wrong, but a while ago some hackers stole some Adobe code and came up with some new ways of exploiting Adobe Reader after reading the code. Is that the type of "attack" you have in mind? Exploring the code and looking for vulnerabilities in it? Or are you thinking about something on the lines of Heartbleed? Where someone discovered that OpenSSL is vulnerable and then hackers started looking for web apps using OpenSSL. — sir_k, Sep 15 '15 at 10:01
Something like that, but automatized way. Please consider that the method described above actually do not require you to find the vulnerabilities. Vulnerabilities are already found by others and documented in repositories and bug-trackers possibly along with patches. We just have to **identify** these (automatically) and find **similar** code structures in other (open or accessible) sources. It's clear that this would be a very poor targeted attack (attack against a specific target) but very good opportunistic one (you look for exploitable weaknesses anywhere). — goteguru, Sep 15 '15 at 13:08

score 3 · Answer 1 · answered Sep 17 '15 at 21:00

In theory, yes, this is possible and there have been some attempts to do this. However, the technique is not all that practical because there are just too many variables involved to generalise the approach. What your talking about is really just a subset of static code anlsysis and it has been used for along time. The problem is, it has a lot of limitations

It is usually language specific. Trying to develop a solution for all software code is too complex
You often need to work at the AST level to eliminate variable names, code style etc. You need high level abstracted representations, which akes it slow
It can be difficult to identify issues which are not closely located in the code. for example, a security hole may only exist if there is a set of preconditions spread over multiple code file or modules
It often gives large numbers of false positives which need to be manually verified.

Having said all of that, there are a number of static analysis tools which can help narrow the search space for potential security holes. These often focus on a specific type of security issue, such as SQL injection or XSS vulnerabilities and are usually restricted to a specific language. Many of the better ones are very expensive.

The potential for such analysis has certainly increased due to the growth in available repositories, such as github. Defining exactly the scope to use can also mean such approaches can be beneficial. This is especially the case due to the growth in the use of open source libraries and frameworks. For example, if you know of a vulnerability in a popular library or framework, it isn't too difficult to identify code which uses those frameworks or ibraries as possible candidates for more intensive search. Likewise, the growth in the use of 'cut n paste' style programming may have some potential - if you find a popular example of how to implement some pattern/algorithm which has a flaw, you may be able to find that pattern in public repositories etc.

This type of static analysis won't automate the process, but it certainly could help in reducing the search space for code with security problems. However, at the end of the day, it will normally require someone to take the results and perform manual analysis and inspection.

Thanks! Can you give me a pointer to these attempts? I would be very interested in their effectiveness. One can say, it's a kind of static analysis, but I would argue the word 'just'. The possibility of brute force when the *exact characters* of the pwd is known is not 'just a brute force' anymore but a serious security hole. Here is the same. We have a good database of known vulnerabilities (bug reports) along with their 'signature' (source code pattern) we just have to locate this signature in *other* programs and exploit. A magnitude easier than to find a new one. And can be automatized. — goteguru, Sep 18 '15 at 09:44

score 1 · Accepted Answer · answered Dec 15 '15 at 16:00

For the record, the answer is mostly yes.

The problem is extensively analysed by several researchers like Hongzhe Li et al. in "A Scalable Approach for Vulnerability Discovery Based on Security Patches" (Springer, 2014) and Silvio Cesare, Yang Xiang, and Jun Zhang: "Clonewise – Detecting Package-Level Clones Using Machine Learning" (Springer, 2013).

The first paper shows evidence of that it is a working method, the second suggests it can be even semi-automatized via machine learning.

Closely related issue discussed in Amir H. Moin and Mohammad Khansari: "Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization" (Springer, 2010)

So the answer is:

yes, it is a viable attack vector,
no, it doesn't have a name (yet),
and, yes there are some research in existence explaining the topic (see above).

Searching vulnerabilities via similar code comparison. Is it a viable attack vector?

2 Answers2