Haskell features I’d like to see in other languages

When I read Ben Hutchison‘s OO/Imperative programmers: ‘Study Functional Programming or Be Ignorant’ I knew I had too much to say for the comments, so I figured I’d put in my 2 cents here.

Haskell is my go-to language, both for scripting, and for getting work done. This is not because of any particular allegiance to the language. Haskell and I have an open relationship, and the moment I find a language that out-Haskells Haskell, you can be sure I’ll move on.

Here I want to describe my favorite things about Haskell. You’ll note that they are all about the type-system. I don’t feel too strongly one way or the other about laziness, or about monads (though I won’t give them up without first finding something to take their place). I don’t even particularly care that it’s a functional language, in as much as I can have these features in a non-functional environment.

Some of these features are already available elsewhere. This is wonderful! If you know of any examples of this, please tell me in the comments.

This is a list of my favorite things:

(more…)

Advertisements

Crypto in the classroom: digital signatures for homework

If you don’t know, I’m a graduate student at the University of Utah, which means I make a living my teaching classes. Recently a student charged that I lost a good deal of her homework. We wound up in a “he-said/she-said” situation where ultimately the dean concluded that we need to raise her grade by a letter under the assumption that I really was up to shenanigans (we ultimately gave her 100% credit in the “homework” column in the grade book, raising her grade from F to D). Not a pleasant situation: aside from a track record of strong teaching evaluations, there was nothing to defend my reputation.

Experienced teachers know that claims of “lost” work are frequent. If we want to be objective about this (and we do), the claims need to be taken seriously, since lost things rarely leave a trail. All we have when analyzing such claims is the following:

  • The missing work never seems to turn up. Not after a week, a month, a semester, a year, or ever.
  • If a person rarely finds that they misplace his own belongings, it’s hard to accept that he is misplacing student work (assuming they treat student work with a reasonable amount of care, as we typically do, given how terrible it would be to lose it!)
  • These claims never seem to come from students who are doing well on exams; they tend to come from students who are backed into a corner, grade-wise.

Of course, it is entirely conceivable that these claims are occasionally correct, and it would be terrible to allow such mistakes — our mistakes — to adversely effect our students.

Last Spring was the only time a student has accused me of losing their work. It was a lousy situation that I have no intention of ever repeating. So when I was assigned to teach a half-term class this summer, I decided it was time to try something new. I’ve recently finished teaching that class; here’s what I did.

(more…)

Syntactic support for Kaminsky’s Interpolique in Haskell

When I recently wrote about my first impressions of Kaminsky’s Interpolique, I mentioned that the only thing I didn’t like is that PHP doesn’t offer any way to protect against syntactic mistakes, such as where the programmer mistakenly uses a $ instead of a ^^.

Today we’ll look at how Interpolique can be implemented in Haskell in such a way that we force the developer to use Interpolique when creating a SQL query, precluding the possibility of the $/^^ mixup bug. In doing so we’ll see that we don’t need anything like PHP’s eval to get the job done.

All of the code for this post is on github: InterpoliqueQQ.

(more…)

First impressions: Serving statically with Snap

(This post refers to Snap 0.2.6.)

There’s been a lot of buzz about the Snap framework, so I thought I’d give it a look. My personal website doesn’t have anything dynamic going on, so arguably Snap is “overkill,” but then again so is Apache, so what the heck. Snap is entirely experimental at this time: in their own words, “it is early-stage software,” so every single critique given here should be read with an implied expiration date.

So the question is: how does one host a static site on Snap? At present time there’s no tutorial for this, so I fumbled around until I got something working. Here’s my code:

main = do
    putStrLn "ninj4net online"
    quickServer config site

site :: Snap ()
site =
    route [ ("kinetic", static "kinetic")
          , ("math1010", static "math1010")
          , ("math1030", static "math1030")
          , ("math1100", static "math1100")
          , ("math1210", static "math1210")
          , ("", static "")
          ] 
    (writeBS "general error")

static d = do
    let html_file = "static/" ++ d ++ "/index.html"
        xml_file  = "static/" ++ d ++ "/index.xml"
    html <- liftIO $ doesFileExist html_file
    xml  <- liftIO $ doesFileExist xml_file
    ( (ifTop (fileServeSingle $ if html then html_file else xml_file)) 
      (fileServe $ "static/" ++ d) )

Discussion

Some of my directories use an index.html file, while others use an index.xml file. I need to use System.Directory.doesFileExist function to determine whether or not these files exist — trying ifTop (fileServeSingle "something that doesn't exist") will not switch to the alternative using <|>, so an explicit check is needed (it will whine about an exception being thrown, completely undermining the choice operator!). This is presumably a bug (I submitted it to their github).

I’m sure there is a much more elegant approach, but this was the best I could muster during lunch.

A thing Snap is missing

I came across a design decision that makes me nervous. As with any web framework, there are facilities for getting strings from the user. Unfortunately, Snap does not use types to distinguish between user-provided strings (dirty) and programmer-provided strings (clean).

Why does this matter? Segregating user input into its own type is a formidable defense against (say) SQL injection, since it obviates that "select * from myData where foo='" ++ userInput ++ "'" isn’t well-typed (presumably SQL code should be its own type, say, SQLString, and the function UserString -> SQLString should be some kind of escape routine). It would be nice to see framework support for this types-based defense.

The most obvious example of this is in getParam, which simply returns a Maybe ByteString.

Another example is provided by getSafePath and fileServeSingle. The former returns a FilePath provided by the user (a “safe” path, which — looking at the source — means that the “/../”‘s get removed), and the latter takes a FilePath and opens the corresponding local file. I suppose the idea is that the code

do p <- getSafePath
   fileServeSingle p


shouldn’t escape the implied sandbox of the file system. Of course, if the application has tighter requirements than this, the type system isn’t there to help out. (For instance, perhaps a path is considered “safe” if it excludes certain keywords in addition to the constraints imposed by getSafePath).

A natural work-around is to build an application-specific wrapper around Snap, and perhaps this is the better approach; I’m not yet sure.

Conclusions

I’m glad that Snap has been announced, as it has proven interesting to look at. Of course, Haskell already has (at least) two other web frameworks (yesod and happstack) and it’s not clear which will win-out in mindshare, nor is it obvious (to me, anyway) which one would be the best choice for someone wanting to sit down and make a site. Of course, it’s possible that some mix-and-match might be the best approach: the web server of one project, the HTML generation of another, and the persistent storage of the third. (This possibility deserves some consideration, especially as projects like BlazeHTML really take off.) Hopefully in the coming months we’ll see more high-powered applications of these frameworks, giving us a fountain of lessons we can capitalize on, and providing some compelling show cases of Haskell’s power as a web development language.

Still solving string injection: first impressions of Kaminsky’s Interpolique

This past week Dan Kaminsky announced Interpolique, a technology for dealing with string injection problems in web applications. The basic idea is pretty sharp: instead of writing (say) PHP code like

$conn->query('insert into posts values($_POST[author] , $_POST[content] );');


we write

$conn->query(eval(b('insert into posts values(^^_POST[author] , ^^_POST[content] );')));


The b function is provided by interpolique. It essentially translates the input string into some PHP code (which is then reified using eval) that base64 encodes the user-input and wraps that encoding up in a call to the MySQL function for base64 decoding.

The idea is that the resulting query is given to MySQL in a format where the user input is base64 encoded. As Dan points out, there aren’t any known injection techniques that can escape the MySQL base64 decoder, and the decoder won’t try to evaluate the resulting string as a SQL expression, so no injection is possible.

I have mixed feelings about this approach. On the one hand, it’s really just another form of escaping (instead of inserting a bunch of \‘s into the string, we’re base64 encoding it), and escaping is an error-prone thing. After all, there’s nothing preventing a tired developer from accidentally mixing some $ in with their ^^, nor could there be — if the developer writes $ instead of ^^, PHP will interpolate the string before passing it off to the b function, so no run-time check will be able to save the day.

(I’m not elated about the use of eval, but (a) I see no way around it if the plan is to use a syntactic approach, as is currently the case, and (b) the only vector I can see for attacking it requires a programmer to leave out the call to b, which is something they’d likely catch during development unless they also used $ instead of ^^, and that double-accident seems unlikely, barring a stupid refactoring snafu.)

On the other hand, if this technique is applied correctly, it seems likely to be robust (peer review should weigh in on this pretty quickly).

When I look at interpolique, I see the next generation of escaping: if you forget to do it you’re screwed, but if you do it correctly you’re safe. interpolique’s contribution is that its style of escaping is much simpler than trying to scan strings for dangerous characters, hence less likely to contain silly errors and edge cases, and that it is cross-language ready, in that base64 encoding isn’t target-language-specific (unlike escaping, which certainly is).

interpolique does not improve upon escaping biggest failure, though: if you’ve got 50,000 lines of PHP, the only way to know that interpolique (or escaping) is being used throughout is to look through the code. This is a PHP shortcoming, of course. We could certainly produce some static analysis tool to check for this design pattern, but then again, if writing tools that understand strings in PHP were easy we wouldn’t have the code injection mess in the first place.

Future direction

interpolique does provide a novel improvement for how to move data across the language barrier. This makes the core idea useful even in situations where programmers aren’t using PHP (or other half-brained-but-common-anyway languages).

In the long term, however, we still need to address the fact that we’re abusing the String type. User-input should be its own distinct type. The fact that this isn’t the case in .Net and Java completely explains why those type-safe languages don’t fare any better than PHP in terms of code injection.

Following the interpolique idea, the only function from UserString to String could be a base64 encoder. Languages could provide syntactic sugar to allow things like

$conn->query('insert into posts values($_POST[author] , $_POST[content] );');


to implicitly denote the interpolique style, thereby preserving type-safety (in this case, separation of user-input from SQL code) without compromising string interpolation style.

(Of course, both of these ideas are already possible in Haskell using algebraic data types and Template Haskell, but this is of little comfort to the vast majority of programmers since (a) most haven’t heard of Haskell and (b) Haskell is still in its web-development-language infancy.)

Moving forward I am interested in seeing whether interpolique passes peer review (probably will), becomes a common practice and reduces the incidence of code injection. Regardless of how these questions fare, the core idea is elegant, doesn’t seem to have a performance penalty, and can likely be carried forward fruitfully in future technologies.