Syntactic support for Kaminsky’s Interpolique in Haskell

When I recently wrote about my first impressions of Kaminsky’s Interpolique, I mentioned that the only thing I didn’t like is that PHP doesn’t offer any way to protect against syntactic mistakes, such as where the programmer mistakenly uses a $ instead of a ^^.

Today we’ll look at how Interpolique can be implemented in Haskell in such a way that we force the developer to use Interpolique when creating a SQL query, precluding the possibility of the $/^^ mixup bug. In doing so we’ll see that we don’t need anything like PHP’s eval to get the job done.

All of the code for this post is on github: InterpoliqueQQ.

Since version 6.10 of the Glorious Haskell Compiler, we have had the ability to essentially define new language syntax. This functionality — called quasi-quotation — is useful for embedding mini-languages into Haskell in a type-safe way. Since Interpolique is basically a mini-language (it’s only operator is the ^^ interpolator), it is natural to use quasi-quotation when implementing Interpolique in Haskell.

Let’s look at an example of what this looks like when it’s being used. On the Interpolique site, an example of an attempted SQL injection is given. The crux of the example is the following code:

$conn->query(eval(b('insert into posts values(^^_POST[author] , ^^_POST[content] );')));


In InterpoliqueQQ (see Test.hs in the InterpoliqueQQ code), the same code can be written as

query = [$interpolique| insert into posts values(^^author , ^^content ); |]


If we hop into interactive-mode with GHC, we can see the value of query:

*Test> query
InterpoliquedString " insert into posts values(b64d(\"Zm9v\"), b64d(\"JyBvciAxPTE7\")); "


Thus the run-time value of query is, in fact, an Interpolique’d SQL query.

Why this is interesting

The important feature of InterpoliqueQQ is that this syntax offers protection in the form of static typing. If we inspect the type of query we get

*Test> :t query
query :: InterpoliqueQQ.InterpoliquedString


That is, this syntax creates a query whose type is InterpoliquedString. In this implementation of Interpolique, the only way to obtain an instance of InterpoliquedString is via this syntax. In other words, if a function is given a query of type InterpoliquedString, it can be completely cetain that the query was generated using Interpolique. Since this syntax does not allow PHP-style string interpolation (that is, there is no analogue of 'insert into posts values($author, $content)), there is no way for a developer to introduce a SQL injection bug due to a misused interpolation operator.

(We can also note that InterpoliqueQQ does not use anything similar to PHP’s eval, thereby rendering any objection to the presence of eval moot.)

The implementation

InterpoliqueQQ is implemented in Haskell in less than 75 lines of code, relying on the powerful parsec parser combinator library, GHC’s quasi-quotation support, and the base64 encoder of the dataenc library.

This implementation is entirely proof-of-concept. In particular, it’s missing two things:

  1. Field-testing. Interpolique hasn’t (yet) been out long enough for peer review to have run its course, so certainly nothing can be said about whether or not this particular implementation is secure.
  2. Library support. This implementation is built on the InterpoliquedString type. In order for this to be useful, there needs to be a SQL library which is ready to act on this type. For the time being, I’ve included the runQuery function in InterpoliqueQQ which just takes an InterpoliquedString and prints (to stdout) the corresponding SQL code, as in
    *Test> runQuery query
     insert into posts values(b64d("Zm9v"), b64d("JyBvciAxPTE7")); 
    

At present time Haskell is a decidedly non-standard language to use for web development. Examples like this, however, suggest that it could be a powerful tool in this domain in the future.

About these ads
Leave a comment

5 Comments

  1. cstone

     /  24 June, 2010

    This is cool; a cleaner proof-of-concept than the PHP one.

    But I really feel that even at the proof-of-concept level, this would be hard to use for developers: you only get to use interpolique-protected parameters when they’re strings in certain parts of a query. Want to insert an ID or a float or decimal? Gotta do it elsewhere. Want to dynamically specify a table name or part of a WHERE clause? Gotta do it elsewhere. With SQL, I find it hard to envision a usable alternative that doesn’t involve either 1) parsing the entire query to determine the appropriate interpolations in a context-sensitive way; or 2) just allowing unsafe interpolations anyway. Using strong typing can work around some of these problems, but not all (particularly the table-name case).

    Also, I brought this up to Dan directly, but: why don’t you just drop base64 entirely and use prepared statements? There’s no added protection here: the base64 step seems useless. There’s no lack of safe encapsulation mechanisms when talking to modern SQL RDBMSes; there’s just a lack of people willing to use them.

  2. I hadn’t thought of your point about splicing in table names. As you mention, strong typing can work around the other problems (after reading your comment I wrote and posted a new branch that allows for type-sensitive escaping, and allows extension by user-defined types). Ex of type-based splicing:

    -- We can interpolique doubles, ints, and bools
    someDouble = 3.14159 :: Double
    someInt = 2 :: Int
    someBool = False
    someString = "foob4r"
    query3 = [$interpolique| insert into sometable
        values(^^someDouble , ^^someInt, ^^someBool, ^^someString ); |]
    
    -- Example where we've Interpoliqe'd a user-defined type
    data SomeCustomType = SomeCustomType Int
    instance InterpoliqueEscape SomeCustomType where
      escapique (SomeCustomType i) = show i
    
    someCustomData = SomeCustomType 7
    query4 = [$interpolique| insert into weirdCustomTypes values(^^someCustomData ); |]
    
    ...
    
    *Test> runQuery query3
     insert into sometable values(3.14159, 2, False, b64d(Zm9vYjRy)); 
    *Test> runQuery query4
     insert into weirdCustomTypes values(7); 
    

    Splicing in table names can certainly be done, but it's not obvious where we should draw the static-checking line. Since the number of tables is finite, we can have an enumerated type allow that to be spliced in, although we won't know until run-time if the splice was written in a way that actually yields an executable query (we have this problem anyway, though, whenever we interpolate).

    The Haskell mailing list has seen discussion about including compile-time checks to ensure these enumerated types actually match the schema, and proof of concept code readily exists (using Template Haskell, no modifications to the toolchain required). (See MetaHDBC.)

    I agree that prepared statements are better, of course. But like you said, there's a lack of people willing to use them. That being the case, it's sensible to try to improve the safety of interpolation.

    Of course, whether or not there are people who will use this style, but don't want to use prepared statements, is yet unclear.

  3. cstone

     /  24 June, 2010

    Yep! That’s the right way to do it. Handling the various number types properly is a big step. I suspect you might need to extend the types a bit so that conversions for numbers with nontraditional precisions (think postgresql/oracle NUMERIC(p,s)) are done in an entirely predictable way. In a lot of cases this shouldn’t be a huge problem, because there are almost always conversion functions built in to the dbms to do everything you’d want. In other cases, the language might provide all you need already.

    Table names aren’t the only strings you shouldn’t escape, either. (And realistically, the set of valid table names isn’t always available a priori… people do some messed-up things.) Imagine the implementation of a simple search system with an arbitrary set of conditions chainable with boolean operators. A common way to do that is to just dynamically build up a WHERE clause based on user input; it should be possible to escape parameters in those substrings, too. Enumerations don’t really seem like the right solution.

    My comment re: prepared statements wasn’t a suggestion that people should just drop interpolation entirely and use the more cumbersome APIs. I really like the interpolique syntax approach. I’m trying to push you (& Dan) to implement this on top of prepared statements. Instead of using base64, generate a prepared statement with the variables already filled in. In some cases, you might even have to do this to get to certain complicated types like blobs and clobs. Databases already come with supported, safe systems for submitting queries while respecting database types and boundaries between query text & parameters. What are you gaining by using base64 and forcing people to install a module or new stored procedure to decode it?

  1. Haskell features I’d like to see in other languages « Integer Overflow
  2. Quasi-quoting: ASCII art to define data structures « The Potential Programming Language

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: