Monday 29 July 2013

Writing a simple Lisp interpreter in Scala

A while ago, I came across Michael Nielsen's blog post "Lisp as the Maxwell's equations of software". In this post, Nielsen describes how the simple set of definitions that make up Lisp, short enough to fit on a single page, together define a model of computing from which all of software can be derived. 

To illustrate this point, he works through how to implement a simple Lisp interpreter in Python (itself derived from Peter Norvig's simple Lisp interpreter). Having produced this, Nielsen then asks: how can Lisp be implemented in Lisp itself? And he proceeds to implement an eval() function - that runs on the Lisp interpreter implemented in the first part! This is not a new exercise of course, this is how Lisp was originally described, and has been a common way to teach Lisp itself, for example in The Structure and Interpretation of Computer Programs (and Nielsen provides ample references to these original materials). But what I think Nielsen bring to the table is a uniquely readable description of the steps involved, taking the reader through this line of reasoning in a single (albeit long!) blog post. If you haven't read this already, I would urge you to go and read it as soon as you can, I really think this is an excellent essay on the nature of software.

I also think Nielsen hits the nail on the head when he highlights how useful this process is for learning a programming language, to actually write an implementation for it. Having tried this for myself, I can attest to the fact that this really works, it's been a real eye-opener. I had played around with Lisp before and thought I had a reasonable idea of how it worked, but writing an interpreter for it really throws up many subtleties that I hadn't picked up on. And after doing this, I find reading other material on Lisp a lot easier, I find myself thinking "OK, how does this actually work in the interpreter", or "what is actually happening in the interpreter for this code". It makes it a lot easier to absorb new material about Lisp I think.

This approach reminds me of the idea of "negative space" or "figure and ground", wonderfully illustrated by many of the drawings of Escher. You can learn a language by writing code in it and seeing lots of examples of its syntax and usage, by using its APIs and so on. But that's just one side of the coin - implementing the language is like flipping things around and seeing things from the opposite angle, the inverse.

One can also observe that for most programming languages, this exercise simply isn't feasible, as writing an interpreter is far too complex. It's only the incredibly elegant and minimalist definition of Lisp that makes this exercise possible to carry out in practice.

A Scala version

As I recently finished Martin Odersky's Coursera course on "Functional Programming in Scala" and have been looking for ways to improve my Scala skills as well as my understanding of functional programming, I decided it would be an interesting exercise to write a version of the Lisp interpreter in Scala instead of Python. The resulting interpreter is available on github for anyone who's interested in having a look.

Working through this did indeed prove a very educational experience and I'll discuss some of the lessons learned in this post. Also, I'd be very interested to hear feedback on better ways to do this in Scala. I'm already aware of parts of this solution that could be improved - even if I'm trying to resist the temptation to endlessly refactor this!

Lessons learned

The resulting Scala version comes out pretty compact and readable I think. I added a few extra features compared to Nielsen's Python version
  • Arithmetic and boolean operators take variable numbers of arguments
  • Error handling is more thorough and error messages more helpful.
Even so, the code is only around 200 lines of actual code - and I did try to resist the temptation to make the code even more compact where it would have made it less readable! (Note that I left out the parsing of expressions that have been split across multiple lines).

Certain features of Scala turned out to be really helpful in the implementation:

Pattern matching is extremely handy, for example in the implementation of eval(), allowing the handling of different Lisp expressions (including error checking) like this:

  def eval(expr: Expr, env: Env): Expr = expr match {
    // ...
    case "cons" :: arg1 :: arg2 :: Nil => eval(arg2, env) match {
      case l: List[Expr] => eval(arg1, env) :: l
      case arg => throw new IllegalArgumentException(s"Second argument to 'cons' should be list, was: '$arg'")
    case "cons" :: args => throw new IllegalArgumentException(s"Invalid arguments to 'cons': $args")
    case "null?" :: head :: Nil => eval(head, env) == List()
    case "null?" :: _ => throw new IllegalArgumentException("Expected exactly one argument for 'null?'")
    // ...

Lambdas were implemented using Scala's PartialFunction type. This proved a useful way to model operations that may or may not apply to the given parameters, and made handling errors of invalid input arguments quite neat.

The Try type also makes error handling neater and avoids lots of nested try-catch blocks for handling exceptions.

Scala streams proved a neat way for the REPL to read an unknown number of lines of input in a functional manner (not using mutable state).

Operator options

One area that caused a fair amount of headache is in the handling of predefined operations such as "+", "-", "<", "<=" etc. The Python version of this simply registers lambdas for these, that refer to the globally defined operators:

        {'+': operator.add,
         '-': operator.sub,
         '*': operator.mul,
         '/': operator.div,
         '<=': operator.le,
         '=': operator.eq

This relies on Python's "duck typing" i.e. its dynamic lookup of methods by name: when passed an object it will do its best to find a method with the right name, not worrying about predefined types in the process.

This doesn't work in Scala of course - it's a statically typed language, and methods such as "<" which exist on the Int and Float classes are not defined in a common base type. To solve this, I experimented with the new reflection API added in Scala 2.10  but the resulting code was ugly and didn't seem like natural Scala code - it felt like I was fighting the language instead of working with it. Instead, I went for an option that registers lambdas that consist of a chain of partial functions, each of which explicitly handle operations for combinations of types. For an example, the "<" operation is defined as:

   "<" -> booleanOpToLambda(chain(
      { case (arg1: Int, arg2: Int) => arg1 < arg2 },
      { case (arg1: Int, arg2: Float) => arg1 < arg2 },
      { case (arg1: Float, arg2: Float) => arg1 < arg2 },
      { case (arg1: Float, arg2: Int) => arg1 < arg2 }))

This is more verbose than I'd like, but it's simple to understand. Also, the allowed combinations of types for which operations are valid to be explicitly defined in the interpreter itself, rather than relying on the details of the underlying implementation language. In a "proper" Lisp implementation, one would in any case have to implement the subsystem for handling numerics and arithmetic oneself, instead of shortcutting this by passing it on to the implementation language.

Future steps

I think the initial version as it stands provides a really neat little test bed both for trying out language features of Scala as well as experimenting with more complex Lisp features. I can definitely see myself using this for future exercises and trials. Some specific examples include:

With respects to the Scala code, I'd quite like to see if a more object oriented solution for expressions would make things neater, more type safe, and avoid some of the type casting that is done. As it is, the representation of expressions is, as in the original Python version, just lists of Strings, Ints, Floats and nested Lists. A simple hierarchy of case classes for each of these may be better. Currently free-standing functions such as the toStr() method would fit naturally into this hierarchy.

Also, the current version uses a mutable hash maps for the environments in which definitions are stored. It would be interesting to use only immutable data structures here, to make the implementation purely functional. I guess that eval() would then have to return a (possibly different) environment after each invocation. I'd be interested to see if this comes at the cost of making the code less simple.

The parser part of the interpreter is (intentionally) extremely basic. At some point I'd like to try out Scala's built-in support for writing parsers, and this seems like a perfect little test bed for trialling this. 

On the Lisp side of things, I would be interested to see how hard it would be to add support for Lisp macros. That's a language feature I haven't got to grips with yet, and I can imagine it being very useful to both learn how to use it at the same type as trying to implement it.

Wrapping up

I've spent quite a bit of time studying Michael Nielsen's blog post and it has proven to be a really, really useful exercise for me. I'd love to hear any feedback on the Scala implementation I've come up with if anyone's interested in having a look, but that said, I don't think the details of my implementation is the most important thing here. Instead I'd recommend you do this exercise for yourself and see what you learn while doing it. If you would like to learn more about functional programming in general, Lisp in specific, and perhaps learning another language to implement code in, then I suspect you'll find this very useful indeed.


  1. Interesting stuff - very similar concepts to C++ template meta programming (also Turing complete) which I have been looking at again lately.

    So why Scala? What can it do better?

    1. Why Scala? That's a good question that I'm trying to answer as I'm learning about the language. There are lots of things I like about it, for example:

      - A strong type system. The compiler catches a lot of errors, and you can build richer class models (though the code I've posted here doesn't really take advantage of that).
      - It has some powerful features that leads to succinct, expressive code, especially compared to Java.
      - It supports functional programming with immutable data structures, which makes concurrent code safe without use of locks, mutexes etc.

      If by "do better" you mean "better than C++", then that's an interesting one. I haven't done C++ in many years now and haven't kept up with the latest developments. But I do really like that Scala code runs on the JVM, and hence can take advantage of the Java ecosystem with the vast range of tools and libraries that provides (and interoperability with legacy Java code too).


Real Time Web Analytics