.. include:: ../global.rst *************** Advanced Scheme *************** Much of what we've seen so far has been slightly different ways of expressing the same things as other languages. Can we go a bit deeper? Syntax Transformers =================== As suggested previously, this is the word of macros. The basic premise is that you can determine some minimalist or comfortable expression and a macro can re-write your code into proper :lname:`Scheme`. It's not so far from, say, :lname:`C` pre-processor macros except you don't have to figure out what the list and string manipulation limitations of the :lname:`C` pre-processor are, you have the full power of :lname:`Scheme` at your disposal. Including, of course, other macros. Of interest, :lname:`Python` is considering `Syntactic Macros `_. To be fair, raw macros are "a bit tricky" so people have gone a bit meta and written syntax transforming macros for you to use. These let you, in turn, create syntax transforming macros. It seems a bit redundant but the syntax transforming macros limit what you can do with the idea being to keep people away from the real deal when they don't need to be there. .. _`scheme-let`: let --- ``let`` is a macro, it is a syntax transformer. Here's how. Just to make you hate the parenthesis a little bit more, our evaluation model for :lname:`Scheme` says that functions are invoked when they are the first element in a list and we've said that ``lambda`` is a function constructor. Let's combine the two. So the ``func`` in: .. parsed-literal:: (*func* *args*) can be replaced with a ``(lambda ...)`` construction: .. parsed-literal:: ((lambda *formals* *body+*) *args*) The number of arguments in ``args`` should clearly match the number of formal parameters in ``formals``. Can we work an example? How about a function that adds two numbers: .. code-block:: scheme (lambda (a b) (+ a b)) or, in a more common styling: .. code-block:: scheme (lambda (a b) (+ a b)) This is constructing a function value which takes two formal parameters, ``a`` and ``b``, to be in scope for the extent of the body forms. It has a single body form, ``(+ a b)`` the result of which is what the function will return. Like a more obvious example, say, ``(+ 1 2)``, had this function been bound to the symbol ``add`` we would have happily invoked ``(add 1 2)`` where ``add`` is going to evaluate to a function value. A function value is just the thing we created with ``lambda``. So put it all together: .. code-block:: scheme ((lambda (a b) (+ a b)) 1 2) (I told you you wouldn't like it.) There's a lot of visual clutter there to figure out it is saying: .. parsed-literal:: (*func* 1 2) and we haven't made the arguments complicated expressions at all! But that *is* what it is saying and so the function value, created on the hoof, is applied to the arguments and we get a result. (Hopefully, 3!) :socrates:`Where does this errant nonsense with on-the-fly function values get us?` ``let``, of course! (The clue was in the heading...) Consider a ``let`` statement: .. code-block:: scheme (let ((a 1) (b (* 2 3))) (+ a b)) In the ``bindings`` we said each binding was a list containing a symbol and an expression, so our symbols are the ``car``\ s of each binding, ``a`` and ``b``, and the expressions are the ``cadr``\ s, ``1`` and ``(* 2 3)``. So a syntactic transformation of ``let`` could walk down the list of bindings putting the ``car`` of each binding into a list of symbols called, say, ``formals``, and the ``cadr`` of each binding into a list of expressions called, say, ``args``. The ``body+`` forms look like they're going to be handled the same way in a ``let`` as in a function so we can easily transform the ``let``: .. parsed-literal:: (let ((*formal1* *arg1*) (*formal2* *arg2*)) *body+*) in the first case into a function definition: .. parsed-literal:: (lambda (*formal1* *formal2*) (begin *body+*)) and then into a function call form by putting the function definition as the first element and the arguments to be evaluated as the remaining elements: .. parsed-literal:: ((lambda (*formal1* *formal2*) (begin *body+*)) *arg1* *arg2*) And, lo! ``let`` has been transformed into an on-the-fly function call. This transformation also explains why ``let`` does not allow you to re-use a formal argument in the definition of another as ``arg1`` and ``arg2`` are presented as independent values to the function. ``arg2`` cannot *use* ``arg1`` because ``arg1`` exists only as a value not as a name bound to a value that you can reference. ``arg1`` is not usable as ``formal1`` until *inside* the function. No one expects to be able to write the following in :lname:`C`: .. code-block:: c int main (char **argv, int argc = length(argv)) .. sidebox:: Actually, I read when researching the :ref:`C-api` work, that the original definition should have been something like: .. code-block:: c int main (int argc, char *argv[argc]) as the size of the ``argv`` array *is* known. However, in :lname:`C`, arrays decompose into pointers with attendant loss of information. where one formal parameter is *derived* from an earlier one. The formal parameters are independent and, so far as you can tell (from inside the function), created in parallel. Why are we doing this anyway? What was wrong with plain old ``let``? In principle, nothing, in practice it means we have to support two different ways of introducing variables into scope. With a syntax transformation like this we only have to support function calls. ``let`` vanishes. In some sense it is much like any transformation of an iterative loop into a tail recursive loop (or *vice versa*) which might be more convenient for your language to support. Let Variant ----------- ``let`` has another funky form: .. parsed-literal:: (let *name* *bindings* *body+*) which really needs an example to understand. .. code-block:: scheme (let foo ((f1 e1) (f2 e2)) body+) gets transformed into: .. code-block:: scheme (letrec ((foo (lambda (f1 f2)) body+)) (foo e1 e2)) :socrates:`Eh? What's happened there and what's that doing?` Look carefully. We've done a similar trick with the bindings. Firstly, the symbols have become the formal arguments again but, rather than for an anonymous function, this time for a *named* function, ``foo``. Secondly, rather than apply the anonymous function to the arguments we've called the named function with those arguments. What we've done is declared a (usually self-recursive, hence ``letrec``) function called ``foo`` which takes formal parameters, ``f1`` and ``f2``, and we *automatically called* it with arguments, ``e1`` and ``e2``. So we've declared and invoked a function in one statement. It seems a bit... *exciting* ...although not a million miles removed from the standard ``let``. However, it's actually really useful. You recall that :lname:`Scheme` prefers a recursive loop and there's any number of occasions when we want to loop over something, counting, filtering, whatever and this ``let`` form allows us to declare a loop function (that can self-recurse) and kick it off with some starting values. .. code-block:: scheme (define stuff '(1 "two" 3 'four)) (let numbers ((ls stuff) (r nil)) (if (null? ls) (reverse r) (numbers (cdr ls) (if (number? (car ls)) (cons (car ls) r) r)))) The function ``numbers`` is defined as taking formal parameters ``ls`` (for the "current" value of the list) and ``r`` (for the accumulated result). ``numbers`` is initially called with ``stuff`` and ``nil``. The body of the function is a single ``if`` statement where it asks, is ``ls`` an empty list? If so return the reversed value of ``r`` (seems a bit odd). Otherwise, recursively call ``numbers`` again with the tail of the list (ah, clever) and the result of another ``if`` statement. This second ``if`` statement is checking to see if the head of the list is a number with ``number?``. If it is then prepend the head of the list to ``r`` otherwise just use ``r`` as it is. So, as we walk down the original list, ``stuff``, left to right, if the head of the list is a number we create a new ``r`` which has this most recent value prepended to it. Hence the need to reverse the final result. It is a different way of thinking. For what it's worth, the iterative variant would look something like: .. code-block:: scheme (define stuff '(1 "two" 3 'four)) (let ((ls stuff) (r nil)) (while (not (null? ls)) (if (number? (car ls)) (set! r (cons (car ls) r))) (set! ls (cdr ls))) (reverse r)) Horses for courses, I suppose, though the use of ``set!`` will upset the purists. Just to keep you on your toes, as :lname:`Scheme` *is* tail-call optimised then it makes sense to transform any iterative calls, like ``while``, into tail recursive calls as originally penned. Then we have the discomfort of re-working the user's ``set!`` calls inside our elegant recursive function. *Bah!* Macros ------ .. epigraph:: Thar be Dragons! --- *everyone* A macro, which is ostensibly a regular function, is created using a special form, ``define-macro`` that tags your function's name as a macro. This tagging function clearly has to have hooks into the backend because, having looked out for all the special forms, the :lname:`Scheme` engine will check to see if the function name for a form is tagged as a macro and if so it does not evaluate the arguments but passes them onto the function/macro as is (atoms, lists). The macro can then poke about with its arguments as it sees fit and return some value. The trick with macros is that the returned value must be something that can be immediately re-evaluated -- so the return value must be an atom or a list. Macros, then, are *code generators*. A great example is the production of boilerplate code. Given a ``name`` we might want to produce a host of related functions that do ``name``-related stuff. Imagine some object-oriented system where you might want to define a class which has a parent class and some fields: .. parsed-literal:: (define-class *name* *parent* *field+*) Who, in their right minds, wants to type in the constructor: .. parsed-literal:: (define make-*name* (args) ... *parent* ...) and accessors for the arbitrary number of fields: .. parsed-literal:: (define *name*-*field1* (o) *field1* ) (define set-*name*-*field1*! (o v) (set! *field1* v) ) and a predicate: .. parsed-literal:: (define *name*? (o) ... ) and whatever else is necessary for every class where all the ``...`` are tracts of code that are identical across all classes when you already have ``name``, ``parent`` and all the field names in your hands? The ``define-class`` call has been given everything it needs to know, the rest of it should get built automatically. To help create those chunks of code the Schemers have come up with :ref:`quasiquoting ` and, normally, a macro will return a quasiquoted expression: .. parsed-literal:: (define-macro (*name* *formals*) ...do any prep work... *quasiquoted-expression*) .. _quasiquoting: Quasiquoting ^^^^^^^^^^^^ Literally quoting things, ``'``, doesn't let us do much in the way of substituting in values, well, not without a struggle: .. code-block:: scheme (let ((a 1)) '(a 2 3)) returns ``(a 2 3)`` which is probably not the ``(1 2 3)`` you wanted. You could try: .. code-block:: scheme (let ((a 1)) (list a 2 3)) which is fine but will itself start to get rather long winded and you'll have to explicitly quote all of the other arguments that don't want evaluating. Instead we can *quasiquote* things where, like variable interpolation in strings in the shell, eg.: .. code-block:: bash echo "this is $0" will iterate through its argument expression looking for *unquoting* expressions otherwise leave things quoted just like ``quote``: .. code-block:: scheme (let ((a 1)) (quasiquote ((unquote a) 2 3))) Here, ``quasiquote``’s argument, ``((unquote a) 2 3)``, is a list of three elements, the first is an unquote expression so ``quasiquote`` will evaluate the argument to ``unquote`` and replace the unquote expression with the result of the evaluation. The other two arguments remain quoted. Evaluating ``a`` results in the number value *1* hence the result of the quasiquote expression is ``(1 2 3)``. Like ``quote``, ``quasiquote``’s result will **not** be evaluated. :socrates:`Eh? Isn't that the point?` Not quite. What we want to do is return what the user might have typed, the :lname:`Scheme` engine can then choose to evaluate it or not. ``quasiquote`` and ``unquote`` have reader macros similar to ``quote``'s ``'`` which are :literal:`\`` (backquote/backtick) and ``,`` (comma) respectively. We could have written the above example as: .. code-block:: scheme (let ((a 1)) `(,a 2 3)) So ``unquote``/``,`` is very much like variable interpolation in the shell, with ``quote``/``quasiquote`` akin to the use of single quotes or double quotes in the shell. One important difference, though, is that it is an *expression* that is evaluated not simply evaluating a variable: .. code-block:: scheme (let ((a 1)) `(,(a+ 5) 2 3)) will return ``(7 2 3)`` (assuming we still have the previous definition for ``a+`` floating about). There is another unquoting expression, ``unquote-splicing`` (with a reader macro of ``,@``) which expects its argument expression, when evaluated, to be a list then it will splice the elements of that list into the quasiquote expression whereas ``unquote`` would have simply inserted the list as a single entity. Given: .. code-block:: scheme (define b '(1 2 3)) Then: .. code-block:: scheme `(a ,b c) returns ``(a (1 2 3) c)`` whereas: .. code-block:: scheme `(a ,@b c) returns ``(a 1 2 3 c)``. It goes without saying that real macros are not trivial examples like this. In practice the ``a`` in .. code-block:: scheme `(,a 2 3) is likely to be an expression and surprisingly commonly it is a ``map`` where each time round the loop you will be generating another quasiquoted expression. Think about the ``field+`` from the class suggestion above. The engine doesn't know how many field names you're passing in so it must use a loop. Of course, ``,`` isn't ideal for handling the list of results from ``map``, we probably want ``,@`` to splice in the results to the output of the macro: .. code-block:: scheme (define-macro (define-class name parent field+) ...class prep work... `( ...quasiquote with ,name ,parent etc... ,@(map (lambda (field) ...field prep work... `(...quasiquote with ,field ,name etc...)) field+) ... )) (The prep work is likely to be creating symbols from combining the symbols for the *name* and *field* variables and so on.) Probably the most subtle misstep people take is that macros are expanded "at compile time" (whatever that is but you get the impression that it's not an ideal time). Which, in turn, should really say that macros are expanded (recursively and in a separate evaluation-space) before the resultant program is evaluated. In fact, it's even more complicated than that as if a macro defines and uses macros itself then those macro-macros live in a meta-evaluation space and another level again if those macro-macros define and use macros.... Macro Issues ^^^^^^^^^^^^ .. aside:: Yes, I'm looking at myself. The three primary problems with macros are: #. you forget its arguments are not evaluated -- so you can't pass a *variable* you've calculated #. you forget it is run at compile time -- your variable doesn't *exist* anyway #. you're a dirty hacker The latter point (probably well made) is about *hygiene*. If your macro is off generating code then you're likely as not going to be using some variables. Variables, of course, are names which, unless you've washed your hands carefully, are going to get in the way of the real program's variables. Suppose we wanted to define some logical-or functionality where you would evaluate a pair of forms in sequence until one of them returned a non-false result and then you would return that result. As we're not going to evaluate all the arguments (forms) before invoking the function we must define a macro: .. code-block:: scheme (define-macro (my-or e1 e2) `(let ((tmp ,e1)) (if tmp tmp ,e2))) which looks good. We've even been smart enough to use a temporary variable, ``tmp`` so that ``e1`` is only evaluated once in the enclosed ``if`` statement. Less smart users would have written: .. code-block:: scheme (define-macro (my-or e1 e2) `((if ,e1 ,e1 ,e2))) and if ``e1`` had been ``(honk-horn)`` then their macro would have expanded out to: .. code-block:: scheme (if (honk-horn) (honk-horn) ...) and they'd have heard "*parp!* *parp!*". Hah, suckers! In the meanwhile our sophisticated winning solution for: .. code-block:: scheme (my-or "this" "that") will expand to: .. code-block:: scheme (let ((tmp "this")) (if tmp tmp "that")) which wins all the plaudits until someone types: .. code-block:: scheme (let ((tmp "foo")) (my-or #f tmp)) The macro is expanded thusly: .. code-block:: scheme (let ((tmp "foo")) (let ((tmp #f)) (if tmp tmp tmp))) Yikes! ``(if tmp tmp tmp)``? Which ``tmp`` is which? This problem is solved by using *hygienic* macros -- actually macros are still macros but rather the macro writer uses a little code trick to inject a unique variable name in the prep work: .. code-block:: scheme (define-macro (my-or e1 e2) (let ((tmp (gensym))) `(let ((,tmp ,e1)) (if ,tmp ,tmp ,e2)))) Here, we used ``gensym`` (generate symbol!) to come up with a unique symbol that cannot conflict with anything else. ``tmp``, now a regular locally-introduced symbol during the evaluation of ``my-or`` (albeit in "compile-time macro-space" -- do keep up at the back!), is bound to that unique symbol value and, having been defined in the outer ``let``, can be unquoted and evaluated (becoming the unique symbol) inside the ``quasiquote`` expression. So, ``tmp`` will have the symbol value ``G707``, say, and the result of the macro, the last form, the ``quasiquote`` expression will look like: .. code-block:: scheme (let ((G707 e1)) (if G707 G707 e2)) If we called ``my-or`` again, ``tmp`` would be bound to a new (unique) symbol value, ``G713``, say, and we'll end up with a similar expansion but one that did not conflict -- which means we could have ``my-or``\ s inside ``my-or``\ s without the per-invocation local ``tmp`` variables colliding (as they never appear in the expanded code). However, macro writers are not perfect and hygiene is difficult to ensure so much work has been made to remove risk from macro writing. There have been two thrusts: ``syntax-rules`` and the more advanced ``syntax-case`` both of which have pattern matching at their hearts. Mind you, the latter is so complicated that it requires a 30,000 line pre-built version of itself to bootstrap! The broad idea of both is to re-imagine macros as syntax transformers (which they always were...) within strict limitations (that's the key). Elements of the macro calls are tagged and the tags are managed so that no unhygienic variables are introduced. (Unless you really really want to!) Closures ======== Let's try a more dynamic variant: .. code-block:: scheme (define a 1) (let ((ab+ (let ((b 3)) (lambda (n) (+ a b n))))) (ab+ 5)) :socrates:`Cripes! What's happening here?` The inner ``let``'s body is a function constructor (``lambda``) and so the inner ``let`` is returning a function *value*. The function is using ``n``, a formal parameter, ``b``, a variable introduced by the inner ``let``, and ``a`` from the top level. The outer ``let`` is binding that function value to the symbol ``ab+`` which is called in the outer ``let``'s body. The result (from the outer ``let``) should be 9, the sum of ``a``, ``b`` and ``n``. But wait! ``b`` isn't in scope in the outer ``let``! No it isn't, but the function value in the inner ``let`` was closed over ``b`` trapping it, if you like, within its body. We can't access ``b`` in any sense from ``ab+``, other than the fact that it is *used* inside ``ab+``, so there's no risk of ``b`` being used *directly* outside of its bound scope. We can't *change* it, it is stuck forever bound to the value 3. So we have here the idea of enclosing the access to a variable inside a function. If the code was different, could we *modify* such a closed variable? Yes, of course. This idea of, in effect, creating a *private* variable is as common in :lname:`Scheme` as anywhere else. You might want to keep a counter such that every time your closure was called it incremented the private variable and returned the result. You might want to maintain a list of *stuff*. The critical part of this trick is that *only this function* has access to this variable. There is no other way for anything to use it or modify it. It is completely private. You can extend this idea to enclosing a suite of functions over a common set of variables which the functions co-operatively maintain. Clearly you can't return multiple function definitions from a single ``let`` but you can play a similar trick to ``letrec``: .. code-block:: scheme (define this #f) (define that #f) (let ((private 0)) (set! this (lambda ...)) (set! that (lambda ...))) Both of the functions bound to ``this`` and ``that`` have global scope (because the names were declared at top level with ``define``) and both of the function values have access to ``private`` (because they were created in the scope of ``private``). No-one else has access to ``private`` other than through the interfaces, ``this`` and ``that``. .. aside:: Well, when I say that... This is about as complicated as it gets! Ports ===== :lname:`Scheme` uses *ports* to describe things that you can read from and write to. The obvious use case is files. In the pernickety way of :lname:`Scheme` there are separate functions to open files for reading and writing: ``open-input-file`` and ``open-output-file``. After that you can read, write, check for end-of-file, close, "seek", "tell" and so on. All the usual things. On top of that :lname:`Scheme` has the notion of the *current* input, output and error ports. Obviously, maybe, those start off being the usual *stdin*, *stdout* and *stderr* that we're used to. However, there are common idioms to temporarily switch those noting that many functions are well aware that they might be switched and will rigorously ask for the *current* input, output or error port. This little snippet from :ref-title:`S9fES` (:cite:`S9fES`) combines several techniques to create a function that takes a file name and a :ref:`thunk ` then switches the current input port to the newly opened file, runs the thunk, saving its result, closes the file, resets the input port and returns the saved result: .. code-block:: scheme (define with-input-from-file (let ((set-input-port! set-input-port!)) (lambda (file thunk) (let ((outer-port (current-input-port)) (new-port (open-input-file file))) (set-input-port! new-port) (let ((input (thunk))) (close-input-port new-port) (set-input-port! outer-port) input))))) Most programming languages have the capability to manipulate files in a similar fashion. We're not breaking new ground, here. .. _`string ports`: String Ports ------------ This is a bit more interesting. Instead of ports meaning an interface to files, how about an interface to *strings*\ ? :socrates:`Eh?` For an *input* string port you need to be able to: - *open* one -- that means wrappering an existing string with a port construct to give it the *port* interfaces - *read* from one -- we can probably get characters from a string, right? All we have to do is maintain a pointer to where we've read so far in the string. - check for *EOF* -- are we at the end of the string yet? - *close* one -- stop using it? - *seek* and *tell*\ ? Well those sound like jumping about to different indexes in the string, setting or returning the current pointer into the string. Can't be that hard. As it is an *input* port then all write operations should fail on principle. For an *output* string port we need a little more trickery under the hood: - *open* -- requires an output port construct where the underlying string is extensible. - *write* -- every time we write to the output string port we *extend* the underlying dynamic string. - check for *EOF* -- is that meaningful? - *close* -- stop using it? Maybe set a flag. - *seek* and *tell*\ ? Well those sound like jumping about to different indexes in the string -- so we should have been maintaining a current pointer into the string like above. Can't be that hard. As it is an *output* port then all read operations are invalid. However, unlike files, where we have an (externally) examinable result -- one we can call ``open-input-file`` on, if nothing else -- this output string port is hiding the underlying string in memory. We must be able to retrieve our carefully crafted musings, hence, ``get-output-string`` which returns the (underlying) string from the string port. Maybe, at this point, string ports seem a little quixotic but, if nothing else, you can recognise that if all kinds of ports are maintaining the same interfaces then they should be interchangeable. Where, previously, you might have been writing to a temporary file, you can now be writing to a temporary string and you wouldn't know. *Whoa!* To be fair, that does rely on the code doing the writing to be scrupulously honest and ask for the current output port if it is going to write anything. Which is why (most) :lname:`Scheme` functions do. Error Handling ============== .. sidebox:: I guess the name comes from the program state being in a particular condition, I'm not sure. :lname:`Lisp`\ s do things a little differently, here. Rather than have "errors" or "exceptions" they have *conditions*. Which have a sort of class hierarchy feel to them as the different types of conditions are (*looks for a better word but fails*) sub-typed from one another. *A* difference is that they are not restricted to errors but users can create their own condition type (hierarchies) and such a condition can be *signalled* (ie. raised) whenever the user deems that a particular state of processing has occurred. Mostly errors and exceptions, though. A second, somewhat more profound difference, is where in the evaluation the condition handler is run. For most languages that have some kind of exception mechanism, say :lname:`Python`'s ``try`` mechanism: .. code-block:: python try: risky_command () except Exception as e: print ("ERROR: risky_command() said: {0}".format (e)) you don't really know, and probably don't care, quite what ``risky_command`` was doing when it ``raise``\ ed the exception. What you do know is that, however deep into its evaluation stack it has gone (think: how deeply nested the function calls in the ``risky`` library got), the evaluation stack is truncated back to here -- all those function calls from ``risky_command`` through to the failure are discarded, we call ``print`` and move on with our lives. :lname:`Lisp`\ s don't go in for that, rather, the condition handler is run *instead of* the failed function (that raised the condition). It can choose how to proceed: return an appropriate value on behalf of the failed function; call the next condition handler up; raise a different condition. Sometimes you do want to truncate processing in which case you need continuations. Nested Handlers --------------- Your handler can be buggy (I know, unlikely) in which case who handles that? Whenever a condition handler is installed it sits first in the queue, if you like, of handlers to be consulted when a condition is raised. Correspondingly, when a handler is run it is run in the context of the next handler out. In other words, you don't handle your own bugs! Obviously this cascades back out to some system installed handlers -- which, you would like to think, are bug-free and can take anything in their stride. Restarts -------- In concert with conditions, :lname:`Scheme` supports the idea of *restarts*. The idea is that the flow of the code gathers up a set of labelled blocks of code called restarts -- in one sense a bit like picking up variables defined in the source that you normally wouldn't use. When a condition is raised the condition handler is at liberty to determine the set of restarts collected and make a decision about whether to call one of them or not or revert to a higher (outer) handler which might have a better view (or a better restart) available. If chosen, control is handed to the restart code block and the program continues. The trick being that, having repaired whatever mess had been caused, it continues back into the code that raised the error in the first place, this time, hopefully, continuing without error. Consider a process managing backup files where the number of backup files is less important than the amount of disk space they are using. Eventually you will fill the disk which requires operator intervention. On the other hand, the "disk full" handler might identify there is a "throw out the old" restart which we picked up before entering the section to create backup files. The restart can be called, have the old backup files cleared out and then carry on back into the section to create backups. I, like many people, have written plenty of code to pro-actively (although often, post-actively) trim the set of backup files but in neither case is that code reactive. Whether that's the right way to approach problem solving is another question. .. _continuations: Continuations ============= .. epigraph:: Timey-wimey stuff. -- The Doctor Continuations are the greatest idea in computing... that you've never heard of -- despite using continuations all day every day! They are also, possibly, the last thing you should be allowed to touch. So, noting we are curious but not cats, let's dive in! I think I have an example that explains the idea of a continuation to a non-continuation audience -- if not, it's going to be tricky. Consider a little snippet of :lname:`C`: .. code-block:: c ... int i = a + b * c; ... There are (arguably) four continuations there, *four*\ ! Let's break it down. As a starter for ten, we know from :lname:`C`'s operator precedence rules that ``a + b * c`` is really ``a + (b * c)`` that is, the calculation of ``b * c`` is performed first and the result passed to the next sub-expression, ``a + []`` -- with ``[]`` standing in for the result of the previous sub-expression, ``b * c``. In turn, the initialisation of ``i`` is waiting for the result of the addition, so, in this style, it looks like ``int i = []``. If we were to rewrite our original snippet it might look like: .. parsed-literal:: ... b * c a + *[]* int i = *[]* ... This is now looking like a little chain of sub-expressions where each sub-expression generates a (single!) value ready for the next sub-expression to use in turn. From the perspective of the ``b * c`` sub-expression, it will calculate its value then *continue* (aha!) onto ``a + []``, therefore ``a + []`` is described as being the continuation of ``b * c``. ``a + []``, in turn will calculate its value and continue onto ``int i = []``. ``int i = []`` is ``a + []``'s continuation. :socrates:`You said four continuations?` Yes. The continuation of the line *before* ``b * c`` has a continuation too: it is ``b * c`` except ``b * c`` chooses not to do anything with the value provided by the line before. Similarly, ``int i = []`` has a continuation as well, the line after it. We can't see from this snippet whether the line after chooses to use the value that ``int i = []`` provides. (Which is itself an interesting question, what is the *result* of an assignment? Discuss.) Each of these sub-expressions is a bit of code that is expecting an argument then calls another bit of code, like a chain of (continuation) functions. We can re-fashion our snippet as (the vastly uglier): .. parsed-literal:: ... k_l(v) { k_m(b * c); } k_m(v) { k_n(a + v); } k_n(v) { k_o(int i = v); } ... Again, exactly the same thing is happening, ``b * c`` is now encoded in ``k_l()`` and ignores the parameter, ``v``. It calls ``k_m()`` with its calculated value. ``a + []`` is now ``k_m()``, it does use the parameter ``v`` and calls ``k_n()`` with the result. ``int i = []`` is now ``k_n()``, assigns the parameter ``v`` to ``i`` and (assuming you've figured out what the result of an assignment is) passes its result to ``k_o()`` and the chain continues. There's a variation called `continuation-passing style`_ (CPS) which might look a bit like: .. parsed-literal:: ... k_l(v, k) { k(b * c, *k_n*); } k_m(v, k) { k(a + v, *k_o*); } k_n(v, k) { k(int i = v, *k_p*); } ... where the continuation that you should pass your result to is passed to you and you have to tell your continuation whom to call in turn. That's looks "difficult" to create but it turns out to be easier than you think. .. sidebox:: At one point I went through the process of transforming my :lname:`C` implementation of :ref-author:`Bill Hails`' :lname:`Perl` interpreter and transformed it into a CPS version. *Phew!* You don't want to do that *too* often. You can even go through a process of transforming your, say, :lname:`C` code into a CPS program which, with your :lname:`C` hat on, looks like you are in a never-ending chain of callbacks. Which is *exactly* what you are in! Of course, you can't *be* in a never ending chain of callbacks because your program will clearly(?) be in a tight loop. At the end of the chain is a NULL pointer (or some other sentinel value) which tells you to stop. How you start and how you stop are another matter. In order to generate the chain, the way I like to think about it is you start with your end-of-chain sentinel value. Each splodge of code you add is like a balloon inflating between where you are now and the sentinel, with the new code scribbled over the (arbitrarily expandable) surface. For the next bit of code you again squeeze it in before the sentinel but with the addition that the last code balloon no longer points at the sentinel as its continuation but now points at the start of the new balloon. When you're done processing the source code you can call the first of your balloons with some dummy value and the chain kicks off and will trundle through, function calling function calling function. OK, back on course. So now we have a seemingly pointless rewrite of our code into the tiniest sub-expressions each of which are doomed to call the next sub-expression. :socrates:`What have we achieved?` To be honest? Not much. However, and this is the trick. These continuation functions, taking a value at a time, they look quite a lot like regular functions, taking a value at a time. Suppose I could ask for one of these continuation functions then I could call it with some value of my choosing, right? Let's take, ``a + []``. If I had access to ``k_m()`` then I could call ``k_m(37)`` and then the program would continue from that point onwards -- because everything is chained together and the links in the chain cannot be reordered -- with ``k_m()`` calculating ``a + 37`` and passing that result onto ``k_n()``. I have dived into the middle of the chain and inserted a value that ignores the (immediately) previous calculation? What happened to ``b * c``? *Pfft!* Don't care, we're continuing with 37. Apart from that immediately previous calculation everything else is the same. The code still accesses the same lexical and global variables, will call the same functions in due course, it's all hard-coded into the program. It has the same prior *state* every time you (re-)call it. *Meh!* Except not *quite* the same state if you have modified one of those (global) variables in the meanwhile. .. sidebox:: I can understand if you don't quite *grok* this. It takes a while. *Whoa!* That's a bit weird. So weird, in fact, that most languages don't give you access to them. Although, to be honest, it's more likely because they can (and probably will) create *causal* havoc. Once you've got a continuation (function) you can keep calling it. *Wooo!* Although that will get boring after a while. Except when it's really cool. Think about generators_ where you want to go back and ask for the next value from a sequence without having to have pre-created all values in advance. Those require that the generating function stop processing and yield a value and then when you next call it the generator continues (hint, hint) from where it left off. The greater use of continuations is not, however, repeatedly calling the same thing but rather for being able to jump to another point in the program -- usually a local point but not always. Indeed, continuations have been called *programmatic gotos* with all the baggage that anything that is called "goto" incurs. Jumping to another point in the code sounds awful but talk to me again about: - ``return`` in a function - ``next``/``last`` or ``break``/``continue`` in ``while`` and ``for`` loops - ``try``/``except`` -- remember the ``risky_command`` example where all that stack of functions calls was discarded? How did that happen? Hint: non-local goto. where you mess about with the natural control flow of the code? Of course, you might be so attuned to something like ``return`` that it has never occurred to you that you are prematurely jumping out of the flow of the code. But ``return`` and ``next``/``last`` are only jumping about in the loop/function? Sure, but they're still jumping about, something has to make that happen. What we're saying here with continuations is that that ability to decide on some kind of control flow jump is no longer the preserve of the language implementers (for example, granting you ``return`` out of the decency of their hearts) but instead is exposed to the user for them to create their own. In fact, continuations are capable of creating *any* control flow structure. *Any*. :socrates:`OK, it sounds like a dangerous free-for-all, there must be some constraints?` Yes, thankfully, you have to be able to get hold of a continuation (function). call/cc ------- You can't get any old continuation -- that would be madness (though there's probably some programming languages where you can). In :lname:`Scheme` you have to have reached the statement before the continuation you want to capture -- that clearly places a severe limitation on being able to jump about. On top of that it is caught with a slightly confusing function, ``call-with-current-continuation`` or ``call/cc`` (a relief to everyone's keyboard, if nothing else). ``call/cc`` does exactly what it says on the tin. It figures out *its own* continuation and passes that to the function you supplied. It has called your function with the current continuation of the call to ``call/cc``. Can't fault the name! ``call/cc`` calls the function with the continuation and, sort of like a single form'ed body of a function, will return the value that the function returns. Confusing, eh? Actually, that's even more annoying than at first blush. What on earth can I do with a continuation if it is just a parameter in a function? That's the least of your worries. Your first problem is *which* continuation are you capturing? Remember our snippet of :lname:`C` where we had four continuations, albeit just the two in our single line of code? .. parsed-literal:: ... b * c a + *[]* int i = *[]* ... If we want to capture ``k_m()`` -- so we can jump in at ``a + []`` with a value -- we need to back-track a little and ask, whose continuation is ``k_m()``? Here, as we know, ``k_m()`` is the continuation of ``b * c``. Now, this is where it gets tricky. We said ``call/cc`` will return its own continuation to the supplied function and we want it to be the continuation of ``b * c`` so we need to insert ``call/cc`` where ``b * c`` currently is. OK, so what happens to ``b * c``? Recall that ``call/cc`` will call the supplied function with the continuation and return the value the function returns. Aha! So we can put ``b * c`` in the body of the supplied function and that calculation will be returned by ``call/cc`` to its continuation which is ``k_m()``, ie. ``a + []``. That sounds like a mess! It is but it's a well-structured mess -- and you get used to it. Let's see if we can visualise that: .. parsed-literal:: func(k) { b * c; } ... call/cc (func) a + *[]* int i = *[]* ... I've got a unary function (one-argument!), ``func``, whose body is the calculation ``b * c`` so that's the value ``func`` should return. We've also slipped in ``call/cc`` into the sub-expression stream *in place of* ``b * c`` so that ``call/cc``'s continuation is ``k_m()``/``a + []``. ``call/cc`` should now call ``func`` with the argument ``k_m()``. ``func`` will calculate ``b * c`` and return it to ``call/cc`` which will return it in turn to... ``k_m()``. Yay! ... :socrates:`Yay? Have we done anything there?` Hmm, fortunately, no. But that's the idea, we've managed to slip in some continuation capturing code and we haven't changed a thing. Definitely a big *wooo!* there, believe you me! :socrates:`But I thought we were going to capture` ``k_m()`` `?` Well, technically we did, in ``func``, except we didn't do anything with it. It's not like we can do anything useful with it *in* ``func`` either. Imagine if you called ``k`` in ``func``: .. parsed-literal:: func(k) { k (37); b * c; } .. sidebox:: Go straight to Jail. Do not pass GO. Do not collect £200. Cool! You will immediately *goto* ``k_m()`` with the value 37. Not only that, you will only *ever* call ``k_m(37)`` because continuations are *control flow* operators. It happens right there, right now. ``b * c`` is *never* going to be called. The only *useful* thing you can do with a continuation inside the function supplied to ``call/cc`` is save it in a variable with wider scope. Wide enough scope that you can call it sometime later. A variable with global scope is considered a bad place to store it as it means that anyone at any time can jump straight back in with ``gvar(37)`` (or whatever you've called it). Instead, you're probably going to be storing it a variable with the least wide scope you can get away with to solve your problem (doubtless there *will* be situations where global scope is appropriate). Think about the private variables discussed earlier. Finally, we've been playing with the sub-expressions, what do we do in the real code? .. parsed-literal:: func(k) { private_k = k; b * c; } ... int i = a + call/cc (func) ... Sweet! Naturally, Schemers don't like named functions because it's a one-use function and we don't need to clutter the namespace -- and no need when you can throw in a ``lambda`` expression and get those parentheses going! .. code-block:: scheme (let ((private-k #f)) (let ((i (+ a (call/cc (lambda (k) (set! private-k k) (* b c))))))) (private-k 37)) Which looks cool (we'll gloss over the ``set!``) but will loop forever. The trouble is, the continuation that we've captured is part of the lead up to the call to ``(private-k 37)`` the invocation of which means we instantly appear to return from the ``call/cc`` function with the value 37 and work our way through to the ``(private-k 37)`` line whereon we instantly appear to return from the ``call/cc`` function with the value 37 and work our way... Oops. Continuations, here, with a popular narrative of "call once, return many times..." need some careful thought. Another, annoying/entertaining example, whilst we're befuddling ourselves is the innocent capture of the current continuation into a top level variable: .. code-block:: scheme (define top-level-k (call/cc (lambda (k) k))) which certainly sets ``top-level-k`` to *a* continuation. We can call that continuation with ``(top-level-k 37)`` whereon we return (again) from ``call/cc`` with the value, uh, 37 which gets assigned to ``top-level-k``. Oopsies, we've just trashed our saved continuation on its first use. I might have mentioned there are "issues" with poorly chosen continuations. Corner Cases ^^^^^^^^^^^^ Choosing to capture a continuation in the middle of a sequence isn't quite fair. What about our two "hidden" continuations, the one for the line before and the one for the line after? The line before's continuation is quite easy. We know -- because we're looking at the code -- that ``b * c`` doesn't use the value passed from the previous expression so if we want to capture this continuation then we can simply insert a dummy ``call/cc`` before our line. Dummy in the sense that its only job is to capture the continuation but not provide any useful calculation. In fact, it *will* produce a result, what ever the result of the act of saving ``k`` is (you *have* figured out the result of assignment, right?): .. code-block:: scheme (call/cc (lambda (k) ...save k...)) (set! i (a + (b * c))) ``k``, if invoked with any argument, will then start processing ``b * c`` (which will ignore the value passed) and we continue cleanly. The line after's continuation is more of a mixed bag. From looking at the code we might determine that the next statement does nothing with the value from the previous statement in which case we can insert a similar dummy ``call/cc`` or we realise that our ``(set! i (a + (b * c)))`` statement is in the middle of something else which is reliant on the result in which case we need to wrapper the whole ``set!`` form in ``call/cc``: .. code-block:: scheme (call/cc (lambda (k) ...save k... (set! i (a + (b * c))))) where the value returned by ``call/cc`` (the value returned by ``set!``) is passed to whomever wants it. ``k``, now, is in the position of giving whomever a different value than whatever you've spent all that time figuring out what the result of an assignment is. Escapes ------- We're familiar with (or have at least heard of!) various exception handling memes: - :lname:`Python`'s ``try``/``except``/``else``/``finally`` and proactive ``raise`` - :lname:`Java`'s ``try``/``catch``/``finally`` and ``throw`` But those are far from any complete set. :lname:`Lisp`\ ers have thrown a few more hats into the ring: - ``(catch label form+)`` allowing during the execution of ``form+`` a ``(throw label form)`` which requires a *dynamic escape environment* (as those ``label``\ s are evaluated and then associated with continuations) - ``(block label form+)`` and the corresponding ``(return-from label form)`` where ``label`` is *not* evaluated (ie. is a symbol/identifier) but still bound to a continuation but the combination is only available in a *lexical escape environment*. - ``let/cc`` (and :lname:`Dylan`'s ``bind-exit``) support assigning a continuation to a dynamic variable - delimited, composable continuations or *prompts* * ``shift``/``reset`` None of those support the idea of a ``finally`` clause. The headline solution to that is :lname:`Scheme`'s ``unwind-protect`` which is a derivative of ``dynamic-wind`` which, despite being in the RnRS, is probably the most subtle and hackiest thing I have seen. So we *will* implement it. The implementation, though, requires several layers to support it which we will get to in due course. Continuations in C ------------------ Naturally, these abstract computer science notions are too high brow for :lname:`C`. Until you realise that :lname:`C` has had the same tools since forever in :manpage:`setjmp(3)` and :manpage:`longjmp(3)` and their signal-safe cousins :manpage:`sigsetjmp(3)` and :manpage:`siglongjmp(3)` (probably all on the same man page!). Indeed, there's a reasonable likelihood that :lname:`Scheme`'s continuations are implemented using :lname:`C`'s ``sigsetjmp``/``siglongjmp``. The Linux man page notes: In summary, nonlocal gotos can make programs harder to understand and maintain, and an alternative should be used if possible. Wise words. Compound Data Types =================== There is a tendency for these to be implementation-specific but broadly you can use the usual compound data types of strings, arrays (or vectors), hash tables and structures. Strings, perhaps unusually for :lname:`Scheme`, allow you to modify them (*poor form!*). I guess this is a commonly expected functionality. ``string-ref`` and ``string-set!`` are accessors. Arrays and hash tables work much as you would expect with the element accessor functions following a similar style: ``array-ref``/``array-set!`` and ``hash-ref``/``hash-set!``. Structures ---------- Various :lname:`Scheme` implementations introduce the idea of *records* (and have normalised the interfaces in various SRFIs). It is very much like the suggestions been given previously for creating classes in that a structure has a type, defining its name and fields (and possibly parent!) and then instances of that structure can be created and passed around. Accessor functions for the fields are created when the record type is created giving you same style of names as above: commonly ``name-field1`` (omitting the ``-ref``) and ``set-name-field1!`` and the like. Object Orientation ================== Object-oriented programming is often bound tightly to the concept of message passing. In the original :strike:`Klingon` Smalltalk_: .. code-block:: smalltalk object <- message arguments which you might imagine, in a :lname:`Scheme`-ly way, to look like: .. code-block:: scheme (object message arguments) which would be perfectly fine so long as we allow an object in functional position -- remember that the first element in a list is going to be applied to the remaining elements in the list. So you might think that should be: .. code-block:: scheme (message object arguments) although we've only just said we can call ``k(37)`` where ``k`` is a continuation not a "real" function. Quite what *is* allowed to be in functional position in a :lname:`Scheme` form is becoming a little greyer. :ref-title:`EPLAiP` (:cite:`EPLA`) p.135, implements the former (in *Perl*) in a straightforward manner. If the evaluated value in functional position is an object then the first argument is assumed to be an object-specific function, a method, and there's a mechanism for looking up a method within the hierarchy of the class. The method is then applied to the arguments with the object itself bound to the variable ``this`` (cf. ``self``) for the duration of the method. On the whole, though, that's not very :lname:`Scheme`-ly which prefers functions in functional position. There's a second argument in favour of not using the message passing idiom in that if there are multiple objects within the arguments then only the *first* argument can be used to distinguish the call. For example, if we wanted to add two numbers together where we might have a class hierarchy of ``Integer`` is a ``Real`` is a ``Number``, then: .. code-block:: scheme (add Number1 Number2) we only get to make decisions about which actual function to dispatch to based on ``Number1``: .. code-block:: scheme (case (class-of Number1) ((Integer) (add-integer Number1 Number2)) ((Real) (add-real Number1 Number2)) ((Number) (add-number Number1 Number2))) which means the implementation functions, ``add-integer``, ``add-real`` and ``add-number``, must look at the class of ``Number2`` to really determine what to do: .. code-block:: scheme (define (add-integer n1 n2) (if (not (eq (class-of n2) 'Integer)) (add-real n1 n2) (make-Integer (+ (Integer-value n1) (Integer-value n2))))) Using multiple arguments to determine the correct implementation method is called *multiple dispatch* and for that we need :ref:`generic functions `. :ref-title:`LiSP` (:cite:`LiSP`) p.87 introduces a basic object system will allow us to define a class: .. parsed-literal:: (define-class *classname* *superclass* (*field+*)) which, as a side-effect, creates a flurry of class-oriented functions based on the identifiers used above. A constructor: .. parsed-literal:: (make-*classname* *arguments*) field accessors for an object of that type: .. parsed-literal:: (*classname*-*field1* *obj*) field mutators: .. parsed-literal:: (set-*classname*-*field1*! *obj* *value*) (note the trailing ``!``) and a predicate: .. parsed-literal:: (*classname*? *obj*) .. _`generic functions`: Generic Functions ----------------- Again, from :ref-title:`LiSP` (:cite:`LiSP`) p.88 Generic functions is a mechanism for declaring the preferred behaviour based on the class of (at least) one of the arguments (although, single dispatch is the norm, multiple dispatch is possible albeit with exponentially increased complexity which is why it isn't common). The point being you can choose which of your arguments should be the one to distinguish behaviour rather than being forced to choose the first. :ref-author:`Queinnec`'s generic function is introduced with: .. parsed-literal:: (define-generic (*name* *arguments*) *body*) Where ``body`` will commonly be a call to an error function to catch instances where the programmer has forgotten to define a class-specific method. One of the arguments must be distinguished (not necessarily the first argument!). That is done by making it a list (of itself): .. code-block:: scheme (define-generic (foo arg1 (arg2) arg3) (error "bad args")) Here, we've made the second argument, ``arg2``, the distinguishing one. Remember, ``define-generic`` is a syntax transformer so its arguments are passed to it direct. There is no danger of ``(arg2)`` being evaluated. Not by the :lname:`Scheme` engine, anyway. A generic method (ie. a class-specific method) is introduced with: .. parsed-literal:: (define-method (*name* *arguments*) *body*) notably, syntactically identical to the ``define-generic`` above Here the distinguished argument must again be identified as a list but this time of itself and the name of the class this method is specific to. Our ``foo`` example might look like: .. code-block:: scheme (define-method (foo arg1 (arg2 ) arg3) ...) .. sidebox:: Here we really do mean class ```` as we allow angle brackets in symbol names. where this ``foo`` method is specific to calls where ``arg2`` is an object of class ````. Our number example looks like: .. code-block:: scheme (define-generic (add (n1) n2) (wrong "no method defined for" n1 n2)) (define-method (add (n1 Integer) n2) (add-integer n1 n2)) (define-method (add (n1 Real) n2) (add-real n1 n2)) CLOS ---- :ref-author:`Queinnec` doesn't need to head down the road of multiple-dispatch (where more than one argument has a class distinguisher) for his purposes. The *Common Lisp Object System* (:term:`CLOS`) is multiple-dispatch and has a very complicated *Meta Object Protocol* (:term:`MOP`) which is far too overbearing for our needs. Subsequently, :ref-author:`Gregor Kiczales` developed *Tiny-CLOS* (:term:`Tiny-CLOS`) at Xerox Parc in 1992 which is a much simpler but still multiple-dispatch with a simple MOP variation on CLOS. *Tiny-CLOS* has been re-implemented in many :lname:`Scheme`\ s as well as other languages. Type Inference ============== That said, whilst :lname:`Scheme` won't help you, there is opportunity for compiler writers to introduce *type inference* where in principle, if only the basic functions have any type information about them stored, everything else can be derived: .. code-block:: scheme (define (strlen s) (string-length s)) (define i 3) (strlen i) A type inferencing compiler, knowing only that ``string-length`` takes a *string* as an argument would deduce therefore that ``s``, the first argument of ``strlen``, should be a *string*, that ``i`` is a *number* therefore the function call ``(strlen i)`` is a type error. .. include:: ../commit.rst