Still solving string injection: first impressions of Kaminsky’s Interpolique

This past week Dan Kaminsky announced Interpolique, a technology for dealing with string injection problems in web applications. The basic idea is pretty sharp: instead of writing (say) PHP code like

$conn->query('insert into posts values($_POST[author] , $_POST[content] );');


we write

$conn->query(eval(b('insert into posts values(^^_POST[author] , ^^_POST[content] );')));


The b function is provided by interpolique. It essentially translates the input string into some PHP code (which is then reified using eval) that base64 encodes the user-input and wraps that encoding up in a call to the MySQL function for base64 decoding.

The idea is that the resulting query is given to MySQL in a format where the user input is base64 encoded. As Dan points out, there aren’t any known injection techniques that can escape the MySQL base64 decoder, and the decoder won’t try to evaluate the resulting string as a SQL expression, so no injection is possible.

I have mixed feelings about this approach. On the one hand, it’s really just another form of escaping (instead of inserting a bunch of \‘s into the string, we’re base64 encoding it), and escaping is an error-prone thing. After all, there’s nothing preventing a tired developer from accidentally mixing some $ in with their ^^, nor could there be — if the developer writes $ instead of ^^, PHP will interpolate the string before passing it off to the b function, so no run-time check will be able to save the day.

(I’m not elated about the use of eval, but (a) I see no way around it if the plan is to use a syntactic approach, as is currently the case, and (b) the only vector I can see for attacking it requires a programmer to leave out the call to b, which is something they’d likely catch during development unless they also used $ instead of ^^, and that double-accident seems unlikely, barring a stupid refactoring snafu.)

On the other hand, if this technique is applied correctly, it seems likely to be robust (peer review should weigh in on this pretty quickly).

When I look at interpolique, I see the next generation of escaping: if you forget to do it you’re screwed, but if you do it correctly you’re safe. interpolique’s contribution is that its style of escaping is much simpler than trying to scan strings for dangerous characters, hence less likely to contain silly errors and edge cases, and that it is cross-language ready, in that base64 encoding isn’t target-language-specific (unlike escaping, which certainly is).

interpolique does not improve upon escaping biggest failure, though: if you’ve got 50,000 lines of PHP, the only way to know that interpolique (or escaping) is being used throughout is to look through the code. This is a PHP shortcoming, of course. We could certainly produce some static analysis tool to check for this design pattern, but then again, if writing tools that understand strings in PHP were easy we wouldn’t have the code injection mess in the first place.

Future direction

interpolique does provide a novel improvement for how to move data across the language barrier. This makes the core idea useful even in situations where programmers aren’t using PHP (or other half-brained-but-common-anyway languages).

In the long term, however, we still need to address the fact that we’re abusing the String type. User-input should be its own distinct type. The fact that this isn’t the case in .Net and Java completely explains why those type-safe languages don’t fare any better than PHP in terms of code injection.

Following the interpolique idea, the only function from UserString to String could be a base64 encoder. Languages could provide syntactic sugar to allow things like

$conn->query('insert into posts values($_POST[author] , $_POST[content] );');


to implicitly denote the interpolique style, thereby preserving type-safety (in this case, separation of user-input from SQL code) without compromising string interpolation style.

(Of course, both of these ideas are already possible in Haskell using algebraic data types and Template Haskell, but this is of little comfort to the vast majority of programmers since (a) most haven’t heard of Haskell and (b) Haskell is still in its web-development-language infancy.)

Moving forward I am interested in seeing whether interpolique passes peer review (probably will), becomes a common practice and reduces the incidence of code injection. Regardless of how these questions fare, the core idea is elegant, doesn’t seem to have a performance penalty, and can likely be carried forward fruitfully in future technologies.

Advertisements

Opal Kelly XEM3001

Today my Opal Kelly XEM3001 arrived. I’m hoping to use it for some high-speed signal processing, which I’ll write about when I have more to show for it. For the time being, I just wanted to mention that the FrontPanel software works on my MacBook Pro more or less, though I must point out that some of the samples are trying to find wave files in c:\windows, which is a cute bug, and luckily pretty harmless (these being just sample projects, after all). The sample code showed that it was possible to upload code to the FPGA (and fast!) as well as communicate to the device via USB.

Edit. Well, it works on my Mac, but not on Ubuntu. The supporting software (which is free as in beer) is shipped binary-only, and is built for Fedora 7. I found this thread where Opal Kelly responded to a user’s support question by trying to sell him a custom build.

Of course, I’d be elated if they’d just release the source, I can’t really expect everyone to wake up to the 21st century. Still, if they insist on this silly binary-only distribution, it’d be nice if they’d support the same distribution that Xilinx supports (ie, Redhat). It’d also be nice if they could find a way to ship without dynamic linking dependencies. *sigh*

Pics:

t

Compiz interferes with GLUT

While working on Armada in Ubuntu, I observed the following obnoxious behavior: the windows I was creating using GLUT.Window.createWindow didn’t have a border. To clear things up for search engines: the windows had no border, had no frame, had no title bar.

I’ve worked with GLUT in Haskell on Mac OS X and did not have this problem. After a bit of googling, I found a suggestion that the culprit was Compiz — the silly program which consumes hardware resources for the dubious purpose of making X-windows more eye-candy laden. Obviously Ubuntu uses it by default.

So here’s the bottom line: if you’re working with GLUT — createWindow in particular — and finding that your windows have no borders, then you should disable Compiz. I know for sure this is an issue with the Haskell bindings for GLUT, though a great deal of people have complained about this who are working in C directly, suggesting that it’s not a Haskell problem — though that much should be obvious, since Haskell is ready for prime time.

“Beautiful concurrency”

I just read Beautiful concurrency, an article by Simon Peyton Jones of MSR fame. MSR is interesting to me for a number of reasons — their fascination with Haskell among them.

This is a good introduction to software transactional memory (STM), presented in terms of the Haskell implementation. STM is an interesting model for concurrent programming. It is an alternative to lock-based concurrency that appears to have great promise for resolving many of the problems associated with locks.

For instance, lock-based code is known not to be composable: if foo() and bar() both lock on the same variable, the composition foo(bar()) will deadlock. However, STM allows for composition. By enabling composition, STM allows for more modularity in concurrent code. This is a good thing.

STM fits quite naturally into Haskell’s strange-but-pretty type system. Go read the article, it’s quite good.

Programming in the Haskell Type System

In The Monad.Reader Issue 8 contains a highly amusing article by Conrad Parker titled Type-Level Instant Insanity. From the abstract:

We illustrate some of the techniques used to perform computations in the Haskell Type System by presenting a complete type-level program. Programming at this level is often considered an obscure art with little practical value, but it need not
be so. We tame this magic for the purpose of practical Haskel l programming.

It’s a bold claim, and the article lives up to it.

The basic idea isn’t as bad as you might first think. Here are the major ideas:

  1. The Haskell Type System allows for types that contain information (for instance, parameterized data types contain information in their parameters).
  2. You can instantiate any type by using the Haskell undefined value, which is a member of every type, and can thus be used as a vehicle for moving typed data around.
  3. Functions can be used to cast undefined from one type to another, and can therefore be used to “act” on typing information.

Here’s an example from the article, which I present here in the hopes that it will motivate you to read the paper:

Begin by defining some empty types, which we will use to represent colors:

data R -- Red
data G -- Green
data B -- Blue
data W -- White

We now define a parameterized data type that will represent a 6-sided cube, where each face is colored:

data Cube u f r b l d

The cube’s typing gives us a way for orienting the cube. By convention, the u parameter gives us the color of the top of the cube, f is the front, r the right, b the back, l the left, and d the bottom.

Of course, we might want a function that can “rotate” the cube. Easy:

rot :: Cube u f r b l d -> Cube u r b l f d
rot c = undefined

Isn’t that cute?

The paper goes on to do some very cool things. A very good read for people who want to practice the art of thinking in Haskell.

Unreasonable bugs: type-based exceptions

When it comes to type systems, I don’t have hard and fast rules for what I like and what I don’t like. I enjoy some dynamic languages, some static languages. I pick which language to use (and consequently which typing system) based on the nature of the project I’m working on.

Usually, if I pick a strongly-typed language, it’s because I’m working on the type of program where I think type errors might be serious and complicated, and therefore I believe that static checking will help me out. Of course, all of this only applies if the language actually catches the type errors at compile time.

“But wait,” I can hear you saying, “don’t static languages always catch all type errors at compile time? Isn’t that the whole point?”

Oh yes, as far as I’m concerned it most certainly is the point. The single greatest advantage to working in a statically-typed language is that, in principle, type errors can be detected at compile time. Unfortunately, for two major languages, there exists type errors that are not caught until runtime. (And I’m not talking about C++ here!)

The problem goes something like this: in both C# and Java, if A is-a B, then A[] is-a B[]. In other words, if we have a function like

void messWithArray(B[] someArg) {...}

we are permitted to invoke it like so:

messWithArray(new A[10])

Now, if messWithArray only intends to read from the array, nothing will go wrong. Unfortunately, this can cause massive problems if messWithArray wants to assign new elements in our array.

To to prevent this from being an issue, the Java Virtual Machine performs run-time type checks on array assignments. Yup!

Here’s some code in Java that demonstrates this problem:
// CParent.java:
public class CParent {
}

// CChild1.java:
public class CChild1 extends CParent{
 
public void CChild1() { }
}

// CChild2.java:
public class CChild2 extends CParent{
 
public void CChild2() { }
 
public void doStuff() { }
}

// typefoo.java:
public class typefoo {
 
public static void main(String[] args) {
   
System.out.println(“Creating array of CChild2’s”);
   
CChild2[] a = new CChild2[10];
   
System.out.println(“Calling setElements on our array”);
    setElements
(a);
   
System.out.println(“Calling useElements on our array”);
    useElements
(a);
   
System.out.println(“All done!”);
 
}

  public static void setElements(CParent[] a) {
    a
[0] = new CChild1();
 
}

  public static void useElements(CChild2[] a) {
    a
[0].doStuff();
 
}
}

$ java typefooCreating array of CChild2'sCalling setElements on our array
Exception in thread "main" java.lang.ArrayStoreException: CChild1
  at typefoo.setElements(typefoo.java:14)
  at typefoo.main(typefoo.java:7)

Disgraceful.