Scan interpreted code for long lines to detect code injection?

Question

As https://security.stackexchange.com/a/11935/35886 states: the best protection against code injection is to prevent it but often you see posts on SO or here that goes like

I found "long line of php/perl, etc code" and want to know what it does.

Then I realized that many of those code injections tend to be very long lines of code (beside being encoded and possibly encrypted) to prevent being too obvious at a first glance.

Now I was thinking whether automated scans of any code base for some long lines of code could provide a cheap mechanism to detect code injection blocks in interpreted languages?

A quick search did not reveal any correlation of hits and false positives when scanning for code lines longer than n chars.

I am ware that obfuscated code, but also badly written code would be detected by such a system, but is there a known usage of such a simple technique in any IDS?

For a general hosting provider this might not be practical (or even legal?) to scan all client files for overly long lines as it would required manpower or further risk analysis of all hits and a notification system for the end user. But hosters for blogs etc, which can get some injection by malicious themes or any other type could profit of that. Or am I wrong here?

not a bad idea. You could also check for the density of the code. Obfuscated code seems to have way fewer spaces. You could also go more in-depth and check if the characters in the code are distributed in a natural way (compared to for example the rest of the codebase; but this seems like a lot more work than checking for line-length and amount of spaces). Of course, if these kinds of checks are used by a lot of people, the attackers are going to react and change their code. And to your question: I would guess that hosters that care already perform these checks. — tim, Jul 29 '14 at 09:47

score 1 · Accepted Answer · answered Jul 29 '14 at 10:54

A lot of anti-malware 'heuristic'-engines do this kind of stuff. They check entropy of chunks of code, or even the entire PE. Lots of malware is obfuscated to exist longer in the wild. The problem with this kind of detection is that is works poorly with smart-er attackers.

eg when I write exploits, I randomize everything keep things small and you will NEVER see me write persistent code to other files. Instead I'll make a reliable exploit and trigger it when I need access to the box.

Thus, yes this works for malware etc, no it won't show human attackers. However if you can do this on running memory, that will certainly help.

Scan interpreted code for long lines to detect code injection?

1 Answers1