Interpolique: transparently preventing SQL Injection and XSS with base64 encoding, what happened?

Question

For the keynote at The Next HOPE a couple of years ago, Dan Kaminsky unveiled Interpolique (the talk is really fun btw). The problem he raised was how to defend against injection attacks, including SQL injection, cross-site scripting (XSS), and other injection vulnerabilities. For example, Unicode makes escaping characters useless and prepared statements are a PITA.

His fix was to convert strings to base64 during transit. In SQL, for example, one can simply pad the SQL call with a decode64 eval(). It's much easier than prepared statements, little (if any) impact on DB performance, transparent to users of the DB, and the native language implementation could make usage transparent for the programmer and nearly as fast in terms of server performance. Similar techniques can be applied to defend against XSS. But, outside of a couple of blog articles written at the time, I can't find mention of it anywhere.

What happened?

Wow. Can't view the video at work, but looking at the code I'm astonished that Kaminsky ever thought this was a workable idea. Sabotages accessibility whilst adding more complexity than just doing it properly. The real solution to injection problems remains as ever: stop using tools that encourage raw string concatenation. Use a library that makes parameterised queries easy; use a templating language that HTML-escapes by default, and so on. — bobince, Mar 08 '12 at 09:11
His thesis, if you watch the video, is that security needs to be compatible with the programmer's thinking, it needs to run linear to how they think about their code. Also, he was shooting for cross programming language compatibility, java, perl, php, etc -not just SQL and JS. — Indolering, Mar 08 '12 at 18:20

score 10 · Accepted Answer · edited Mar 17 '17 at 13:14

Summary. I'd speculate that Interpolique was not successful for several reasons. One important reason is that Interpolique was, perhaps, not solving exactly the right problem, and there are a number of other similar approaches in the space with similar or better properties. Another possible reason is that it may not have been marketed in the optimal way. Also, I'd speculate that market demand and difficulties of monetization may have played a significant role.

Background. Interpolique is a generic idea for defending against injection attacks. Injection attacks are a broad class of vulnerabilities; the most familiar examples are SQL injection and XSS.

Kaminsky's slides do spend most of their time talking about SQL injection, but I think it would be a mistake to assume that Interpolique is solely or primarily about SQL injection. Kaminsky does start out by explaining Interpolique in the context of SQL injection defense. However, after introducing the basic idea (in the context of SQL injection), he then explains how it applies to XSS and other injection attacks. I'd speculate that perhaps he framed the discussion in this way as a pedagogical device: because it is easier to explain the ideas in a simple setting, such as SQL injection, before discussing how they generalize.

It is also useful to know that Interpolique is one step in a line of work on defending against injection attacks. It is preceded by BEEP (which was badly broken) and then Blueprint (which shares many similarities to Interpolique, and has many similar strengths and weaknesses).

Not solving exactly the right problem. At the most general level, Interpolique is trying to solve the secure string interpolation problem. This is the right problem to solve. Security issues with string interpolation are responsible for a broad variety of injection vulnerabilities. However, Interpolique's novelty is in the part that is, arguably, not the most critical part.

To address the secure string interpolation problem, there are a bunch of issues that need to be addressed. How do we escape or encode the untrusted data, so it cannot break out of the context it is supposed to be confined to? How do we tell what context the untrusted data is being inserted into? How do we know what is the right escaping/encoding function to use, for that context? How do we integrate the solution into web application frameworks? How do we educate developers about how to use it?

Interpolique's most novel aspect is the method it uses to encode untrusted data so it cannot break out of its context. Interpolique's approach involves base64 encoding, which is indeed an elegant and robust solution to the problem of encoding data so it cannot escape from its context. However, that problem is not the most critical one to solve. The standard way to solve that problem is through escaping. And escaping seems to work well enough, in practice, as long as you use the right escaping function for the context where the untrusted data is being inserted: e.g., encodeForHTML for text going into HTML, encodeForURL for data going into a URL context, etc. The escaping functions are standardized and seem to be pretty reliable.

So it is not clear that Interpolique has a significant advantage over competing approaches. Its novelty is, arguably, focused on one of the less important aspects of the problem. It's not clear that Interpolique's special sauce (the base64 encoding part) offers that much advantage to users over other possible approaches.

The most important challenge to tackle, I'd argue, seems to be framework integration and outreach: integrating this stuff into the web application frameworks used by developers, and teaching developers how to use it. The technology for context-sensitive auto-escaping is now pretty well understood, and the hardest challenge is to get it into widely used frameworks.

Unfortunately, that's not one where I saw a lot of effort from the Interpolique project. This takes a lot of elbow grease: communicating with the folks who maintain each of the frameworks, articulating the benefits, writing code/patches to integrate the ideas into the frameworks, working with them to devise solutions that are acceptable to them, etc. I don't know whether the Interpolique project really pushed on that very hard.

Marketing. I don't know whether Interpolique was marketed in a way that ordinary web application developers would understand its benefits. Its benefits are apparent to security researchers, maybe, but the folks who build web applications, or the folks who maintain web frameworks? Not so clear.

Kaminsky's slides focus a lot on SQL injection. While this might make sense from a pedagogical perspective, in retrospect I wonder if it might have been a tactical error from a communication perspective. There are already "good enough" solutions to SQL injection. The focus on SQL injection might have triggered a knee-jerk reaction from some developers who were thinking "why do I need this stuff when I feel like I've already got SQL injection reasonably well under control?". If the potential adopters don't perceive the benefits, they're probably not going to be very receptive to the technology.

Limited market demand. In addition, for most developers, security is a secondary consideration. It is not so easy to sell a point scheme that addresses a small part of security, when security isn't their primary goal. Perhaps if you could sell a comprehensive suite that solves all of their potential security problems, developers might become a lot more interested -- but that's not what Interpolique was.

Contributing to this perception that security is not a big deal is that many application developers don't think they have a problem and don't perceive a need for a solution (many of them may be probably fooling themselves, but so it goes). And of the developers who are aware of XSS, many think they can avoid it by just being careful (that's a dubious proposition, but hey, if they don't perceive the need for a scheme, they ain't gonna buy it).

In any case, it's not clear how you'd make money off something like Interpolique. It doesn't fall into any of the usual categories. It's not a tool. It's not a service. It's not a comprehensive solution to security. Rather, it is an idea for how to improve various web frameworks. To get Interpolique into customers' hands, you'd have to integrate it into those frameworks. However, those frameworks are usually free. So I don't see how you'd make money off of it. How do you monetize a library or an idea? Where's the business model? Perhaps the Interpolique folks had a way to make money, but I wonder if part of the lack of success of Interpolique is because there was no clear way to make money off the idea.

Interpolique's contribution: XSS. Let's come back to what Interpolique is good for. Personally, I believe that its primary contribution is helping to defend against XSS (possibly other injection attacks as well, but most prominently XSS). It is a pretty clever technique for that -- though it still requires developers to understand how to use it properly, and to remember to use Interpolique everywhere that they interpolate untrusted data into HTML. Compare to context-sensitive auto-escaping, which happens automatically, doesn't require developer attention, and isn't susceptible to errors of omission where developers forgot to call the escaping function.

Kaminsky's talk does discuss how to use Interpolique to prevent SQL injection, but I suspect that is primarily a pedagogical device. Interpolique is a general defense against injection attacks, of which SQL injection and XSS are two examples. SQL injection is easier to understand and simpler to understand than XSS, so it was probably simpler to explain the ideas behind Interpolique in the context of SQL injection than XSS.

Interpolique and SQL injection. As others have said, prepared statements are a "good-enough" solution to SQL injection for practical purposes (there are indeed some cases that are not handled by prepared statements, but they can probably handled by escaping/validation plus careful code audits), so that's not where Interpolique is most useful.

I realize that prepared statements do have some limitations. One limitation, which Dan Kaminsky explains well, is that the syntax is less than ideal if you are interpolating multiple dynamic values into a long template: it is hard to keep track of which parameter goes where. However, in practice, this doesn't seem to be a showstopper: developers seem to cope. Another limitation of prepared statements is that there are rare cases where they cannot be applied, such as dynamic ORDER BY or LIMIT statements. However, neither of those limitations seems to be a serious problem with prepared statements in practice, so I just don't see Interpolique as being that much better than prepared statements. And prepared statements have the benefit of being simple, already supported in almost every framework, and already evangelized well to developers.

Why was Kaminsky criticizing escaping? Here's what's going on. If you want to escape untrusted data before interpolating it into some string, you need to know how the recipient of the string will be interpreting it, to determine what characters need to be escaped. If you think the recipient will interpret it as ASCII, but actually the recipient will be treating it as UTF-8, you've got a problem: you probably won't be escaping everything you need to.

This sort of thing has introduced vulnerabilities in PHP/MySQL web applications in the past (here's another example), when they tried to escape untrusted data before interpolating it into a SQL query but didn't realize that the database was using a different character encoding than expected. This is especially relevant to PHP/MySQL web applications, because there is no integration between the PHP app and the MySQL database: there's no easy way for the PHP code to tell what character encoding the MySQL database will be using on this connection, so there's no easy way to tell what escaping function needs to be used.

So manual escaping is potentially error-prone, since developers sometimes use the wrong escaping function, introducing a security hole.

On the other hand, for XSS prevention, these issues can be addressed given sufficient integration into the web framework. The framework can tell what character encoding the HTML document is in (because it sends the Content-Encoding: headers, and also because it can parse the HTML document itself), and given sufficient smarts, it can tell what HTML context the untrusted content is being interpolated into (inside a HTML tag? inside an attribute? a URL? etc.). Consequently, given a sufficiently intelligent web framework, the framework can automatically determine what escaping function needs to be used and automatically apply it. This is the premise behind context-sensitive auto-escaping, and that's how it addresses Kaminsky's criticisms.

The future of this area. The security technology in this space that has seen the most take-up is context-aware auto-escaping, such as that implemented in Google's ctemplate. I think this is the most exciting and promising direction for the future. I don't see much of a business opportunity to make money -- but I do think it could make an appreciable difference towards making web applications more secure and helping developers avoid secure woes.

Context-sensitive auto-escaping involves framework support for defending against XSS automatically, without any extra steps or awareness from the developer. That goes a step beyond what Interpolique provides. Consequently, context-sensitive auto-escaping is even better for developers than Interpolique, because in most situations it requires no additional effort from developers: escaping is done automatically for them. It is better for security, too: escaping is done by default, so you end up with a system that is secure by default.

You could of course combine Interpolique's ideas with context-sensitive auto-escaping (using Interpolique's base64 encoding stuff instead of standard escaping), though the benefits of doing so aren't entirely clear to me.

You can find Interpolique and similar technologies discussed in the following questions: Whitelisting DOM elements to defeat XSS, Escaping JavaScript constants.

Conclusion. I believe that Interpolique was a beautiful, elegant idea, and one that I suspect may have continuing influence on the future of web application development -- even if it is not adopted directly. At a minimum, it can be understood within the broader context of the move towards context-sensitive auto-escaping and other techniques to help developers avoid common vulnerabilities.

Awesome response, but there is one caveat that the posted slides skip over: Kaminsky claims that Unicode's millions of characters and best-fit mapping scheme make character escaping impossible. What then? I guess just wish the language would deal with this itself : P — Indolering, Mar 08 '12 at 09:39
@Indolering I think that only applies if part of your pipeline decay the text into a legacy encoding along the way. I'd expect any modern database to talk in unicode natively. — CodesInChaos, Mar 08 '12 at 09:47

score 3 · Answer 2 · answered Mar 08 '12 at 05:07

Oah great someone discovered a way to prevent all vulnerabilities, now every application has to be rewritten from scratch... I'd be wary of such a claim.

Lets pick this one apart...

So really what is a base64? It is a way to represent data in a binary safe way. You are representing the data in a limited character set in order avoid control characters such as ',",\,0x00. So in order to benefit from this any time this string is used within a sink, like a sql query it has to be encoded, but it doesn't make sense to treat all strings as base64, otherwise you're code would be full of nonsense and wasteful of resources: if($str==base64encode('start')){...}. Okay, so why not use a sanitize function like addslashes(), this is also a way to represent data in a binary safe way. One problem, both of these functions are a horrible approach to sql injection because parametrized queries entirely solves the problem of sql injection.

Okay so what about XSS? Well in order to display a message to the user it can't be base64!. So, this method cannot prevent against XSS. No single function can prevent all XSS because XSS is an output problem and is highly dependent on where the user output is within the HTML.

That being said, base64 can be a useful tool for encoding user input to be used in a wide verity of sink functions; including but not limited to file operations, the command-line and sql queries. This is one tool of many that people use on a regular basis to patch flaws in their application. base64 is almost never used to prevent XSS, I have never seen base64 used this way in the wild.

Interpolique: transparently preventing SQL Injection and XSS with base64 encoding, what happened?

2 Answers2

Linked