Still solving string injection: first impressions of Kaminsky’s Interpolique

This past week Dan Kaminsky announced Interpolique, a technology for dealing with string injection problems in web applications. The basic idea is pretty sharp: instead of writing (say) PHP code like

$conn->query('insert into posts values($_POST[author] , $_POST[content] );');

we write

$conn->query(eval(b('insert into posts values(^^_POST[author] , ^^_POST[content] );')));

The b function is provided by interpolique. It essentially translates the input string into some PHP code (which is then reified using eval) that base64 encodes the user-input and wraps that encoding up in a call to the MySQL function for base64 decoding.

The idea is that the resulting query is given to MySQL in a format where the user input is base64 encoded. As Dan points out, there aren’t any known injection techniques that can escape the MySQL base64 decoder, and the decoder won’t try to evaluate the resulting string as a SQL expression, so no injection is possible.

I have mixed feelings about this approach. On the one hand, it’s really just another form of escaping (instead of inserting a bunch of \‘s into the string, we’re base64 encoding it), and escaping is an error-prone thing. After all, there’s nothing preventing a tired developer from accidentally mixing some $ in with their ^^, nor could there be — if the developer writes $ instead of ^^, PHP will interpolate the string before passing it off to the b function, so no run-time check will be able to save the day.

(I’m not elated about the use of eval, but (a) I see no way around it if the plan is to use a syntactic approach, as is currently the case, and (b) the only vector I can see for attacking it requires a programmer to leave out the call to b, which is something they’d likely catch during development unless they also used $ instead of ^^, and that double-accident seems unlikely, barring a stupid refactoring snafu.)

On the other hand, if this technique is applied correctly, it seems likely to be robust (peer review should weigh in on this pretty quickly).

When I look at interpolique, I see the next generation of escaping: if you forget to do it you’re screwed, but if you do it correctly you’re safe. interpolique’s contribution is that its style of escaping is much simpler than trying to scan strings for dangerous characters, hence less likely to contain silly errors and edge cases, and that it is cross-language ready, in that base64 encoding isn’t target-language-specific (unlike escaping, which certainly is).

interpolique does not improve upon escaping biggest failure, though: if you’ve got 50,000 lines of PHP, the only way to know that interpolique (or escaping) is being used throughout is to look through the code. This is a PHP shortcoming, of course. We could certainly produce some static analysis tool to check for this design pattern, but then again, if writing tools that understand strings in PHP were easy we wouldn’t have the code injection mess in the first place.

Future direction

interpolique does provide a novel improvement for how to move data across the language barrier. This makes the core idea useful even in situations where programmers aren’t using PHP (or other half-brained-but-common-anyway languages).

In the long term, however, we still need to address the fact that we’re abusing the String type. User-input should be its own distinct type. The fact that this isn’t the case in .Net and Java completely explains why those type-safe languages don’t fare any better than PHP in terms of code injection.

Following the interpolique idea, the only function from UserString to String could be a base64 encoder. Languages could provide syntactic sugar to allow things like

$conn->query('insert into posts values($_POST[author] , $_POST[content] );');

to implicitly denote the interpolique style, thereby preserving type-safety (in this case, separation of user-input from SQL code) without compromising string interpolation style.

(Of course, both of these ideas are already possible in Haskell using algebraic data types and Template Haskell, but this is of little comfort to the vast majority of programmers since (a) most haven’t heard of Haskell and (b) Haskell is still in its web-development-language infancy.)

Moving forward I am interested in seeing whether interpolique passes peer review (probably will), becomes a common practice and reduces the incidence of code injection. Regardless of how these questions fare, the core idea is elegant, doesn’t seem to have a performance penalty, and can likely be carried forward fruitfully in future technologies.