April 3, 2008

It's time to learn Scheme

Author: Daryl Lee

Have you ever peeked into one of those bazillion .el files in your Emacs installation's lisp folder and wondered what it meant? Or have you ever looked at a GIMP script .scm file and scratched your head over all the parentheses? Lisp is one of the oldest programming languages still in common use, and Scheme is a streamlined dialect of Lisp. Many universities use Scheme as the language to introduce students to the Computer Science curriculum, and some of their teaching methods are based on the assumption that Scheme is the one language they can count on their students knowing. Even so, many active programmers and system administrators are unfamiliar with Scheme. This article will get you on your way to adding this tool to your developer or sysadmin toolkit.

One of the first questions people have about Lisp-family languages such as Scheme is "What's up with all the parentheses?" Scheme programs are just sequences of lists, and lists are delimited by parentheses. After all, LISP itself is an acronym for LISt Processor. Whereas other programs use a variety of punctuation for syntax, Scheme's syntax is trivial -- everything is expressed as lists -- and this simplicity manifests itself in lists of lists of lists, all showing up as nested parentheses. In C you might see something like

if (x == y)
z = x;

where you see syntax elements (), {}, and ;. A similar behavior in Scheme would be written as

(if (= x y) (set! z x))

The entire expression is a list with three components: the operator if and two more lists. The first of these two is a list with operator = and two variables; the second is the operator set! and two variables. There's no need to learn when to use braces or when the semicolon is required.

List manipulation

Where do lists come from and how are they taken apart for interpretation? There are three mystically named procedures, cons, car, and cdr. cons is an abbreviation for "construct" and constructs a pair from two elements. If the second of the two elements is a list, the resulting pair is a list. (There is also a non-list pair, but that is a topic for a longer article.) car returns the first element of a list, and cdr (pronounced could-er) returns the rest of the list, which might be empty, denoted (). The two operators car and cdr get their names from a couple of registers used on the very early computer that Lisp was first written for.

Normally, Scheme tries to evaluate any symbols it encounters in a list. For example, if presented with the list (a b c), Scheme will try to evaluate a, b, and c. To prevent this evaluation and tell Scheme to just use the list literally, a single quote mark precedes the symbol or a list, as '(a b c). The quote operator can also be used, as (quote (a b c)).

If ll is the list (a b c), evaluating (cons 'd ll) produces the list (d a b c). Each of the elements could itself be a list. Now let's go the opposite direction and extract elements of the list. (car ll) returns the value a and (cdr ll) returns the list (b c). How would you get the value b? A good guess would be (car (cdr ll)). But this is such a common task that Scheme provides a suite of abbreviations. The one-step way to get b is (cadr ll).

We have been flirting with the question of how Scheme processes lists. A complete discussion takes quite a bit of space, but a beginner-level explanation is that Scheme looks at the first item in the list and evaluates it first, then decides what to do with the rest of the list. In a simple case like (+ 13 aa), Scheme sees that + is the addition operator, and then proceeds to evaluate 13 and aa. 13 just evaluates to the number 13, and if aa evaluates to a number, the two are added. If aa does not evaluate to a number, Scheme reports an error and aborts.

Some operators, called forms, do not always result in all of the following elements being evaluated. For instance, if the first item in the list is or, following elements are evaluated in sequence only until one evaluates to #t (Scheme's representation of true). For example, in

(or (number? x) (display "NaN"))

the display does not happen if x is a number.

Defining procedures

We need to take a deeper look at one special operator, define, Scheme's operator for defining new procedures. The easiest way to grasp this operator is with the aid of a simple example. Let's define a procedure to compute the cube of a number. We'd like to end up with a procedure that will return the cube of, say, 32 when we say (cube 32). Most Scheme programmers will define this procedure as:

(define (cube x)
(* x x x))

Simply start with the operator define, write a pattern for how the procedure will be called, replacing a symbol for the parameter, and then an expression or a sequence of expressions for what the procedure does. The procedure's return value is just the value of the last expression evaluated. This example is trivial, but it contains the required elements.

Many procedures require local variables to be effective. Scheme offers two operators for initializing local variables, let and let*. The structure of both is similar:

(let ((var1 val1) (var2 val2) ...)

where any number of variables can be set to any number of values, and any number of expressions can be evaluated. The values may themselves be expressions. The difference between let and let* is that in let expressions, all the value expressions must not contain any of the variables defined in the let, whereas in the let*, the values may be expressions that use variables defined above.

The method described above is syntactic sugar for a method that is both more flexible and, arguably, more expressive of what's happening under the covers. You can write this:

(define cube
(lambda (x) (* x x x)))

The procedure lambda is an operator that produces an unnamed procedure. Procedural programming, which is well supported by Scheme and Lisp, makes heavy use of such procedures. The object returned by lambdais the procedure, and the define simply assigns a name to an otherwise anonymous procedure. If this seems a bit bizarre to you, don't get sidetracked by it; you can write a lot of effective Scheme code without ever typing l-a-m-b-d-a.

If you're an experienced Scheme programmer, you're probably sputtering something to the effect that there's a lot more to defining procedures than this. That's true, but the number of options and variations is beyond the scope of this introduction. For all the gory details, turn to the references in the sidebar.

Key differences between Scheme and Lisp

Comparing GIMP script-fu files and Emacs elisp files quickly raises a lot of questions about differences between the two. The first difference to get used to is the difference between define in Scheme and defun in Lisp. In the latter, the pattern is

(defun funcname (parm1 parm2)

It's an easy enough difference to remember: Scheme uses a pattern that looks like how a Scheme procedure is called, and Lisp uses a pattern that looks like a lambda definition.

The other key thing to remember is that it is customary to define tests in Scheme with a question mark at the end, such as list?, whereas in Lisp it is common to put a "p (for "predicate") at the end, such as listp.

Beyond these two differences, it's mostly a matter of learning procedure names by that old tried-and-true method: start writing some code while you have a browser opened onto a good reference.

A simple practical example

Ready to see Scheme tackle some real work? I spend my workdays writing C++ code, and I got tired of always having to type in the same boilerplate information into the .cpp and .h files I create for every C++ class I create, so I called on Scheme to do that task for me. Now, by entering the shell command NewClass MyClass, I get two files, myclass.h and myclass.cpp, both set up with just the right minimum of C++ content so I can go right to work on coding. You should be able to follow along now.

#! /usr/bin/guile -s
;; Note the unusual shebang lines above.
;; NewClass
;; Usage: NewClass ClassName
;; Produces classname.h, classname.cpp (all lowercase)
;; for C++ class ClassName (case preserved)

; Create header file containing class declaration
(define (make-cpp-decl className)
(let* ((fileName (format #f "~a.h" (string-downcase className)))
(outf (open-file fileName "w"))
(headerTag (format #f "~a_H" (string-upcase className))))
(display (format #f "#ifndef ~a\n" headerTag) outf)
(display (format #f "#define ~a\n\n" headerTag) outf)
(display (format #f "class ~a\n" className) outf)
(display "{\n" outf)
(display "public:\n\n" outf)
(display "protected:\n\n" outf)
(display "private:\n\n" outf)
(display "};\n\n" outf)
(display (format #f "#endif // ~a\n" headerTag) outf)))

; Create source file containing class definition
(define (make-cpp-def className)
(let* ((prefix (string-downcase className))
(srcName (format #f "~a.cpp" prefix))
(hdrName (format #f "~a.h" prefix))
(outf (open-file srcName "w")))
(display (format #f "#include \"~a\"\n" hdrName) outf)))

; Utility procedure for invoking both action procedures
(define (make-cpp-class className)
(make-cpp-decl className)
(make-cpp-def className)))

; Get second item on command line and invoke the utility procedure on it.
(make-cpp-class (cadr (command-line)))


This should be enough to get you started. If you're a GIMP or Emacs user, play around with a few scripts. If you're not, spend a little time with guile and experiment. Pick a common task that you'd like to automate and see if you can implement it in Scheme. Before long, you'll want to hunker down with one of the resources given above to expand your knowledge. Have fun!


  • Programming
Click Here!