How could malicious code changes in a GitHub pull request be masked by an attacker?

Question

When viewing a pull request on GitHub (or the equivalent on any other platform), the web interface displays a diff of the changes for you to review.

Reviewing the diff is obviously vulnerable to human error, as malicious changes can be snuck in (see mortenson/pr-sneaking).

Are there any obfuscation techniques that could not be reasonably spotted by a human performing thorough code review in the GitHub web interface?

One example is a homoglyph attack, that could potentially make a series of characters appear to a human as one value while actually being another.

There could also be a vulnerability/bug in the diff engine or output display that could be exploited to hide or mask malicious code in a pull request.

To clarify I am not asking about a human's ability to accurately review code changes - I am asking about potential spoofing/masking vulnerabilities that could be exploited by an attacker to deceive a human into accepting a seemingly legitimate pull request.

score 2 · Accepted Answer · answered Oct 05 '18 at 18:33

Bugs in the process of diff generation are unlikely. Pretty much every web interface for a VCS system actually calls out to the VCS tool itself to generate the diffs. Pretty much, any bug that might be there is pretty likely to be found and fixed quickly (it's generating user facing data, so any bugs there are high visibility).

Given that, your options amount to:

Homoglyph attacks, which you've already mentioned. I don't see these as being very likely, as they are dependent on the font used for rendering, and they would have to be in data literals in most programming languages to not cause syntax errors, which in turn pretty severely limits what types of bug you can introduce.
Modification of binary files in the repo. When dealing with binary data, most VCS systems only tell you the file changed, not how it changed, so there are no changes to review in the PR short of downloading and using the modified file. This is one of the two reasons that sane people don't use version control for binary data, the other being that it's really bad at handling it efficiently. Because not many people do this, it's not a likely attack vector.
Simple steganography. This essentially relies on the reviewer not paying attention or not understanding the code they're reviewing. The two obvious approaches are including a small change that introduces the bug in the middle of a couple of sections with lots of changes, or taking advantage of the fact that changes are shown per-line.

I don't think steganography is the right word for what you're describing. — forest, Oct 06 '18 at 03:42
@forest What term do you think would be more appropriate then? This may not be steganography as most people think of it, but it's just the hiding of information, it has no requirement that the information be well hidden or that it be specifically be embedded in a way that isn't immediately visible to someone who knows what to look for. — Austin Hemmelgarn, Oct 08 '18 at 18:00

How could malicious code changes in a GitHub pull request be masked by an attacker?

1 Answers1