How to review code for backdoors?

Question

I have a codebase that needs a code review to evaluate it for backdoors.

The code is far too big to review it all, how would you recommend approaching the problem.

It is a java web application with an oracle database, the code is customized from a product that is exceedingly large.

The customizations cover almost all of the codebase, but I can identify the customised code automatically.

The problem is that all answers now are understanding backdoor as simple a network backdoor, but it can be as well an "easter egg". An example, a logic backdoor in a ATM that allows a bad guy to withdraw all money from the ATM. — VP., May 10 '11 at 16:33
@VP01 - this is why you need a combination of human and tools. A human can't get through a million lines of code in a sensible timeframe, but is far more likely to spot a logic bomb. — Rory Alsop, May 10 '11 at 17:22
@Rory Alsop - Yes, right tools, humans and business knowledge. But everything should be done together with the development. After, it is really hard to pay a analyst to do magic or to buy a specific tool. So my opinion the best way to review code for backdoor is together with the development and not after. — VP., May 10 '11 at 20:49
@VP01 - in an ideal world, absolutely, however the OP already has the codebase, so looking at the best we can do in that situation. — Rory Alsop, May 10 '11 at 20:51
yes, but remember that people that goggle for "How to review code for backdoors?", will arrive here and maybe for them, it's already not too late ;-) — VP., May 10 '11 at 21:00
Thanks everyone for your insightful comments, the risk is not high in this specific case. — Andrew Russell, May 11 '11 at 06:20

score 12 · Answer 1 · answered May 11 '11 at 00:40

The bottom line: You are screwed. If you are concerned that one of the developers deliberately hid a backdoor in that codebase, you have no realistic hope of telling whether a backdoor is present. Life sucks.

Comment: Some folks here are suggesting you can check for a backdoor by reviewing the code, or using static analysis tools, or somesuch. Don't believe it. They are fooling themselves if they think that this is likely to detect a deliberately hidden backdoor. In my opinion, those answers are overly optimistic and are likely to give you a false sense of security. (I know I'm going to make myself unpopular by saying this and dissing other commentators, but I feel a responsibility to give you my honest, frank advice.)

Advice and mitigations: As for what you should do, I think you need to tell us more about the function of that piece of software, how it relates to your business, and what are the consequences if it does have a backdoor. Here are some generic mitigations you could consider, which might or might not be relevant to you, depending upon the circumstances:

Risk transfer. Require the supplier to provide a warranty that the code is free of backdoors, with major financial penalty clauses if any are found. (Note that, if a backdoor is present, the chances are finding it are pretty low, so the penalties if one is found have to be increased proportionally to the inverse of the probability of detecting it.)
Isolation. You could try to isolate the effect of a backdoor, so that it can only affect the functioning of this piece of software and has limited opportunity to attack other systems of yours. You could run it in a virtual machine, firewall it off from your networks, etc. You could potentially also firewall it off from the network, to make it harder for a bad guy to activate the backdoor.
Monitoring. In some cases, it may be possible to perform external monitoring to detect illicit activity. For instance, in a slot-machine joint, you could monitor the amount of money taken in, the amount paid out, and statistically, those two should bear a strong relationship; if you see pay-outs that exceed the expected amount by over five standard deviations, that might be a good reason to get concerned. As another example, at a bank, you may be able to use double-entry book-keeping and track some aggregate metrics, such as the rate of consumer complaints and how often consumers dispute charges. These kinds of monitoring techniques are highly specific to your particular business, but can potentially be effective at detecting shenanigans.

Keep in mind that none of these are likely to provide a really good defense against deliberate backdoors, and they may or may not be applicable in any particular situation, but if you're lucky, they might be better than nothing.

I just sat down today to review a 1M+ line codebase for a similar suspicion. (A developer who left over 10 years ago was found to have brought some of our code to a different company, so we wondered if they had left anything nasty behind. A very low risk, but maybe worth checking.) After about an hour of reviewing the code, I realized what a futile prospect it was to achieve 100% confidence that it was clean. And this is a codebase that I'm extremely familiar with, and was involved in rewriting and adding to over many years. I can't imagine trying to audit something completely new to me. — Jordan Rieger, Jul 25 '17 at 21:50

score 6 · Accepted Answer · answered May 10 '11 at 13:46

I would start with a structural overview - from a design perspective, are separate parts of code well defined? eg do you have validation code, input and output functions etc which are used for those purposes throughout the codebase, or is every function individual? Do you have code which is functionally safe (often certain style constructions do not impact the security of data flow)

If you have a security wrapper which carries out authentication for every function, you can possibly shortcut review of those functions and just check for usage of the wrapper function, for example.

If it is a very large codebase, then you will want to run a tool such as Fortify (or others that @AviD will be able to name :-) to make a first pass at the problem, but all tools suffer from a lack of context intelligence. They identify based on typical signatures, so you will get false posisives (and possibly false negatives - which is why having a good overview can help you identify risks a tool won't spot)

The idea is that the tool narrows down the possible risk areas and identifies the vast majority of issues, as tools are relatively cheap, then a human validates and adds to the tool's output, placing it into the context of the application environment.

At risk of sounding overly commercial I would advise using the services of an experienced security consultant who not only knows the code review tool inside out and is fluent in Java + Oracle, but also someone experienced in business and security risk based architecture.

Good approach. Start by identifying the entry points, see how they handle data, what authentication is done, how that is performed. Then look at where the data goes next and how that's treated. Rinse, repeat. — , May 10 '11 at 16:34
:). yup, you pretty much covered almost all... See my addition below. — AviD, May 10 '11 at 23:59
Baloney. Fortify and other static analysis tools are unlikely to detect a deliberately hidden backdoor. No consultant is going to be able to review a one-million-LOC codebase and tell you it is free of backdoors, or even have a good probability of detecting a backdoor if one is present. It doesn't matter how experienced the consultant is. — D.W., May 11 '11 at 00:33
Interesting how this aligns with the [OWASP Application Security Review Standard](https://www.owasp.org/index.php/Category:OWASP_Application_Security_Verification_Standard_Project) — Andrew Russell, May 11 '11 at 06:27

score 5 · Answer 3 · edited Mar 17 '17 at 13:14

5

@Rory pretty much covered how to go about doing the review...

I'll just add that you should know what you're looking for, and not just "backdoor" in general (similar to what @VP01 said in his comment on top).

E.g. are you looking for backdoors that do:

Authentication bypass (via special identity, or super-password);
magic parameters (ala "?admin=1");
penny-stealing (like in Superman 3);
information-stealing (e.g. emailing creditcard numbers in the backend);
... something else.

If you know what types of backdoors you're looking for, you don't have to concentrate equally on each of the millions of lines of code, and you can prioritize.

I'll also add that there are some automated tools that can be scripted very richly, such that it supports looking for those specific types of backdoors that you define, based on human intelligence and context, then applies that throughout the millions of LOC...

P.S. you might be interested in some of these related questions:

edited Mar 17 '17 at 13:14

Community

1

answered May 11 '11 at 00:12

AviD

72,138
22
136
218

I don't believe that there are any automated tools that are likely to detect a deliberately hidden backdoor. – D.W. May 11 '11 at 00:35
@D.W. if you can define what type of backdoor you're looking for (as I listed some examples), you can definitely script up something appropriate. For example, I've built used to Checkmarx to build scripts (though they call them "queries") to discover *any* type of special authentication bypass. I've also done the same for backdoor stealing of creditcards. – AviD May 11 '11 at 08:38
I think the issue is you're thinking of the traditional models of automated tools, ala fortify splint et al. There are some newer tools (such as the above Checkmarx, also others) that work very differently. – AviD May 11 '11 at 08:40
@AviD, I am quite skeptical that it is possible to code up a general query that will detect deliberately hidden backdoors (in general). – D.W. May 11 '11 at 09:11
@D.W. ah, see - I agree with you there, *if* you use the term "general". However, as I said, if you limit what you're looking for, and define *a specific type of backdoor*, it is possible. Complicated, and dependant on the skill of the one doing it, but possible nonetheless. – AviD May 11 '11 at 09:17
1

Think about it this way: a credit card comes into the system, and can be identified in code. It is then possible to correlate *any* use of that piece of data, no matter how obfuscated, and track anywhere the data goes or is used. Yes, you would need to specify *what* is interesting to you - e.g. you specified email and temp file, but you missed raw sockets - but it is possible to track and highlight all "suspicious" usages (for a given value of "suspicious"). – AviD May 11 '11 at 09:18

VP. · Answer 4 · 2011-05-10T22:13:22.260

2

I think the answer from Rory touches the kernel, but the timing is really important here. To do the code review after that the code is already huge, badly documented, badly tested, in production (i don't know if that is the case here) is already almost "too late" to do it right. Even with the best tools and Java/Oracle external analysts will be harder to understand business logic flaws (intentionally planted there). In my opinion the code analysis since the beginning is the way to go.

edited May 10 '11 at 22:13

answered May 10 '11 at 20:58

VP.

1,043
1
11
12

1

almost too late? It is too late, not just almost. It's not just that it is "harder" to understand intentionally planted logic flaws -- it is that you are up against a pretty-darn-close-to-impossible task. Code analysis is not likely to give you a very high chance of detecting an intentionally planted backdoor in a one-million-line legacy codebase, unless you get really lucky or the bad guy is unsophisticated. – D.W. May 11 '11 at 00:37

score 0 · Answer 5 · edited Mar 17 '17 at 13:14

0

There are some methods that are common to every application review, regardless of language is used. Here are some tips:

to watch changes in large code, you can use diffing tools (like I mentioned here: Code Review Strategies);
setup environment so you can log and see any suspicious network requests;
very simple method is to use tools like "grep" - will help with some basic and obvious malicious code;

edited Mar 17 '17 at 13:14

Community

1

answered May 10 '11 at 14:48

1

Yeah, but monitoring network requests and grepping code is only going to catch some very basic kinds of backdoors. Maybe this gives you a 1% chance of detecting a backdoor, maybe it's a 5% chance, but the chance is very low. Code diffs don't help if someone plops a one-million-LOC legacy codebase in your lap and tells you "I'm worried it might already have a backdoor". – D.W. May 11 '11 at 00:35
Firstly, should I also say "it's better than nothing"? It is clear that to increase chances to find backdoor, one would need to do complete code review. Secondly, in my answer I don't repeat what already was mentioned here. Thirdly, it does automation, helps at some level. I would not say that chances are always so low like you mentioned. It depends. Lastly, in your own answer you mostly mentioned mitigations, except of the last. It's also right point. But my reply do focus on detection, not mitigation or prevention. – May 11 '11 at 09:03

How to review code for backdoors?

5 Answers5

Linked