Bard

I've made some substantial changes to the design of Categories for the 0.3 release. The changes are rather sweeping and incompatible; luckily, the community of users other than me is small, and I've cleared the changes with him.

Domains are out. So is the c3 domain and inheritance, for the moment; I'm willing to admit that there may still be a place for them, but I want to try an iteration without them, and see what happens.

In 0.3, protocols become first-class objects, and categories take on the central role in the object system that its name suggests. So far, the results are very gratifying. It seems like I may have hit upon the right organization of concepts, one that clearly reifies the ideas I expressed so many years ago in the "Protocols" article.

But this post actually isn't about Categories. It's about the programming language that Categories is a part of.

I wrote the first implementation of Categories to address things I wanted and couldn't get in Clojure. However, I've been working on my own design for a Lisp dialect since around 2002, a dialect that I've been calling "Bard". As Categories took shape, I realized that it was the last missing piece of the Bard design, and that, after several years of tentative implementations, I finally had all the pieces I needed for the language.

The latest version of Categories seems to seal the deal, and now I find that I'm ready to talk about Bard publicly.

I didn't set out to design a new language, originally. I just wanted a modernized implementation of the old Dylan language, from about 1992, before it lost its Lisp syntax. I had the incredibly good luck to spend 1992 through part of 1994 working on an operating system at Apple that was written in Dylan, and I've never had so much fun with a programming language before or since. I missed it; I wanted it back. I couldn't get it anymore, so I sort of decided to make one of my own in my spare time.

Along the way, a small handful of people have followed along and provided ideas and encouragement. I should particularly mention Joe Marshall, Michael Maynard, and most helpful of all, Philip McBride, all of whom gave me comments that either removed obstacles to my thinking, or provided encouragement when I needed it.

Bard owes a lot to Dylan, but it isn't Dylan. It isn't even old, circa-1992 Dylan anymore. I've been strongly influenced in its design by ideas from several other languages, especially Python, Clojure, Haskell, and ML.

Bard is definitely a Lisp—very much so. At the risk of drawing flames, I'd say that in some ways it's more a Lisp than even Scheme is. My reasoning there is that, technically, the source code of a Scheme program is text, and I think that's a misstep. Historically, and in most other Lisp dialiects, source code is explicitly or implicitly not text; it is s-expressions. Text is just a print representation of the s-expressions. Bard's design consciously adopts that convention, because I think it's an important feature.

Like Clojure, and like Dylan before it, Bard makes no attempt to be compatible with any previous Lisp. I'll a lot of grief for that, I know. There is a cadre of programming moralists who will wail and gnash their teeth that the world doesn't need Yet Another Object System, Yet Another Lisp, or Yet Another Language. All I can say in reply is, "So what? I couldn't care less what The World needs. Designing and implementing Bard is entirely about what I need." If you aren't interested in another Lisp, you are quite free to move right aong. There's nothing to C here.

Bard bears a resemblance to Clojure in some respects, although nearly every part of its design predates Clojure's. I did lift the architectural idea of distinguishing value from identity straight from Rich Hickey, and I'm grateful to him for it. It neatly solves some nagging problems I'd been having with how best to integrate features based on Common Lisp SERIES.

Whew! After all that, I suppose I should actually provide a little bit of description of the language itself.

Here are some examples of primitive values in Bard, using their standard lexical syntax:

  void     ; the "nothing" value
  10       ; an integer
  12.34    ; a float
  2/3      ; a ratio
  \space   ; a character
  \A       ; another character
  baz      ; a symbol.  Bard is case-sensitive; baz and Baz are
           ; different symbols.
  :Foo     ; a keyword.
  Bar:     ; also a keyword. :bar and bar: are different ways 
           ; to write the same keyword
  true     ; a Boolean
  false    ; the other Boolean
  ()       ; the empty sequence
  []       ; another way to write the empty sequence
  (sequence 0 1 2 3 4 5)
           ; an expression that returns the sequence 0,1,2,3,4,5
  [0 1 2 3 4 5]
           ; another way to return the sequence 0,1,2,3,4,5. 
           ; In Bard, as in other Lisps, "(foo bar baz)" applies 
           ; the function foo to the arguments bar and baz. 
           ; Written this way: "[foo bar baz]", the expression 
           ; denotes the same sequence, but also tells Bard to 
           ; treat it as a literal sequence, and not as a function 
           ; call. It's syntactic sugar for the "(sequence...)" 
           ; expression.
  ""       ; a text object
  "Hello!" ; another text object
  (0,1)    ; a pair
  (,)      ; the empty pair
  {}       ; the empty map
  { greeting: "Hello!",
    length: 5}
           ; a map containing two entries
Bard's object system is based on Categories. Protocols, (generic) functions, methods, representations, and categories are all central, first-class concepts in Bard. Types in general are represented by categories, which are purely abstract and defined by protocols. You never explicitly define a category; you define protocols, and they implicitly define categories.

This design falls out naturally from separating representation, protocol, and taxonomy, as I've discussed in previous posts about Categories. Let me define the terms a little more explicitly, and then show some more code.

Bard (and Categories) defines values in terms of representations, protocols, and categories.

A representation is a layout of data in memory.

A protocol is a definition of the operations that apply to some set of values.

A category is an abstraction over a set of values that play a role in a protocol.

For example, a 64-bit, unsigned, twos-complement integer is a representation. Here's a simple protocol:

(protocol Addable (add <Number> <Number>))

This expression defines a protocol named "Addable", a function named "add", and a category named "<Number>". It's a very simple protocol, with just one function; in most cases, protocols will define many functions and categories.

The protocol, function, and category are all first-class objects. They are also abstract; we don't yet have any thing we can call, nor any concrete values we can use. Well, we have the protocol, function, and category, but we can't add together any numbers with those, yet.

So what exactly is a <Number>? We don't know yet. We don't have any examples of a <Number>. We'll have to make one:

(define-method add ((x <integer>)(y <integer>))
   (+ x y))
Okay, now we have an example of an actual method for add that performs a concrete operation on two values whose representations are <integer>. Now we know that <integer> is an example of a <Number>, and that (+ x y) is an example of the function add.

After the above expression is evaluated, there is a method object associated with the function add, and if we ask Bard whether <integer> is a <Number>, it will say that it is:

  ? (isa? <integer> <Number>)
  true
Suppose instead that we ask this question:

  ? (isa? <float> <Number>)
  false
See, we haven't provided any examples that show <float> to be an example of <Number>, so Bard says it isn't one. Actually, I'm lying. In current running Bard systems, the answer would be true, because the standard library does, in fact, provide such examples.

This is what I meant when I said you don't explicitly define categories. Categories are defined by protocols. That was the central idea of the "Protocols" idea: that taxonomy is defined by protocol. The type of a particular representation is defined by the set of operations you can perform on it.

By the way, the functions and categories defined in one protocol can be freely reused in other protocols, as long as you don't try to define anything that contradicts already-existing protocols.

Defining a protocol creates a name for a class of operations, and for any abstract categories of values that they may operate on. You can ask Bard for a list of all the defined protocols, and you can ask each protocol for a list of all the categories it defines. At any given moment you can find out which representations belong to which categories.

You can extend any protocol by defining new methods for its functions. Bard provides functions for defining new representations, and you can add those new representations to any categories you need by defining methods.

That brings us back to the values I listed above. Let's look at the types of those values again. The convention in Bard is that representations are written like this: "<unsigned-32-bit-integer>", and categories are written like this: "<Number>"

  void     ; <void>
  10       ; <Integer>
  12.34    ; <Float>
  2/3      ; <Ratio>
  \space   ; <Character>
  \A       ; <Character>
  baz      ; <Symbol>
  :Foo     ; <Keyword>
  Bar:     ; <Keyword>
  true     ; <true>
  false    ; <false>
  ()       ; <empty-sequence>
  []       ; <empty-sequence>
  [0 1 2 3 4 5]
           ; <Sequence>
  ""       ; <empty-text>
  "Hello!" ; <Text>
  (0,1)    ; <Pair>
  (,)      ; <empty-pair>
  {}       ; <empty-map>
  { greeting: "Hello!",
    length: 5}
           ; <Map>

Notice that most of the types given are categories. Wherever it makes sense, Bard treats the types of values as categories, meaning that it doesn't promise to give you any particular representation (unless you specifically ask for one). That leaves the compiler free to use whatever representation it thinks will work best in a given context, while also providing ways for you to say that you want a specific representation when you know better than the compiler. In most cases, it doesn't matter which representation the compiler picks for a value, as long as it obeys the right protocol—and, because of the design of Categories, the value picked always obeys the right protocol.

Notable is the fact that

  (foo bar baz)
does not necessarily denote a traditional Lisp list. It might; a chain of cons cells is a perfectly valid thing for this expression to mean, as long as there are methods defined that make that representation a valid instance of <Sequence>. The above expression denotes a <Sequence>; in the absence of further explicit directions, the Bard compiler is free to represent the <Sequence> any way it chooses; semantically, it makes no difference. The point of the above list of types is that Bard treats most values and types this way. <Text>, for example, is Bard's equivalent of the string type in most languages, but <Text> is a category, not a representation. Using a <Text> value doesn't tie you down to any particular representation of text; you can use ASCII or Unicode, or Martian Hyper-Quantum Ropes, it doesn't matter, as long as the protocols defining <Text> are implemented for whatever representation you decide to use.

Like Clojure, ML, and Haskell, Bard primarily uses immutable data structures. Like Clojure and ML, it provides mutable containers for values, for those cases where you really need to assign a value to something. Bard calls these value containers cells, and they are instances of <Cell>.

By default, if you ask for a <Cell>, you'll get a <Versioned-Cell>. Both <Cell> and <Versioned-Cell> allow you to replace a value stored in the cell. <Versioned-Cell> remembers what the previous value was. You can ask a <Versioned-Cell> how many values it has held, and you can ask it for a particular one. In effect, a <Versioned-Cell> acts like a memoized (possibly infinite) stream of values, and also like an integer-indexed sequence.

Naturally, using <Versioned-Cell> involves some overhead; if you know for sure that you won't ever want to know a cell's previous values, you can explicitly ask for a non-versioned cell, and then the only overhead you'll pay for using it is Bard's thread-safety protections—and not even those, if your program is single-threaded.

The majority of Bard's data structures, which are immutable, are intended to be implemented using representations with good amortized algorithmic complexity. I have several partial implementations running. The Common Lisp one uses the FSet library for its aggregate data structures. The Haskell-based virtual machine uses nice data structures provided by several Haskell libraries. Clojure would make a reasonable substrate for a JVM-based implementation. You could reasonably expect to implement the language on Erlang, or, with the right data-structure libraries, on Gambit-C.

Bard uses a module system based on Dylan's. A module is a first-class object that contains a set of defined <Symbol> objects. When you refer to a symbol in Bard, you are referring to it in the context of a module. The syntax for module-qualified symbols looks like this:

  bard.lang.<Number>

That expression reads as "a reference to the <Symbol> <Number> in the module bard.lang". There is a single global namespace of names for modules, so that a module name is always unambiguous. A module is not a synonym for a source file; as in Dylan, I presume that someone might like an implementation that doesn't use discrete source files. I also presume, though, that most people will use plain old source files, and that at some point I'll define some straightforward way to map modules onto the files that define them.

Like Dylan modules, Bard modules manage only symbols, and by default, Bard makes symbols private. If you want other modules to be able to use a symbol, you export it. You can also import a symbol:

  (in-module user)
  (import (bard.lang.<Number>))
After the above expressions, a reference to <Number> in the user module means the same thing as bard.lang.<Number>. This only works, of course, if <Number> was previously exported from bard.lang.

As in Dylan, modules and files are orthogonal: all the code after an in-module expression uses the named module, until a subsequent in-module expression changes it.

There are a lot more mundane details, of course. Bard supports concurrency primitives generally similar to those in Erlang and Clojure; the exact set that will be available at first release is still under discussion. Most of the defining constructs of Lisp—including macros—are available in Bard. A reasonably complete library of functions for the above-described data structures will be available.

I said I have several implementations in the works. The most important ones are probably the one written in Clozure Common Lisp and the one written in Haskell. The Categories work in Gambit-C Scheme and in Ikarus Scheme could potentially evolve into Bard implementations, as well. I sort of hope they do; for one thing, I'd like to use Bard for some product work I'm currently doing in Gambit.

The CCL version is important because it's the farthest along. It's possible to write simple programs in it now, though it's not close to being ready for release. The roadmap for it is very clear, though; it's a simple matter of programming. It's worth keeping in mind, though, that Bard is a hobby project, and it takes lower priority than my paid work and the product work that I fund. So, although the amount of work to get it ready for a release is not really large at this point, it may take a while, anyway.

Other important features of the CCL implementation are that we may reasonably expect it to run on Mac OS X, Linux, and Windows once it's ready for release; and we may also reasonably expect it to have some sort of foreign-function interface. It will also be compiled to native code (though not necessarily very good native code in early releases).

It already supports interactive, incremental compilation, and can save and load images. These nice features are no credit to me; I'm just piggybacking on what CCL can do. They're important, though; image-based and incremental development are crucial features in my mind. I'd like Bard to eventually have an interactive environment that is reminiscent of SK8 and Smalltalk.

The interesting thing about the other main implementation is that it involves writing a rather quirky and maybe-interesting virtual machine in Haskell. I freely admit that I don't know how well it will work, but I'm keen to find out. There are certain aspects of Bard that are reminiscent of Haskell, and I'm basically curious to see what happens when I try making a VM for it in that langauge. I'm also using that implementation as an excuse to try writing a very high-level VM, one whose design is very close to the source language, instead of very close to the hardware beneath it. I've written several low-level ones before, using tricks like exploiting GCC's computed labels for speed in dispatch, and so forth; I wanted to try something really different and see what happened.

I'm going to talk to the tclisper's group on February 22nd about designing and implementing Lisp (by telepresence; I'm not physically going to be there). I imagine that talk will involve some more stuff about Bard. Who knows? I might even be able to run some example code for them.

If you couldn't care less about Bard, but are very interested in Categories, don't despair. Work on Categories 0.3 is progressing as well. In fact, they are in part the same thing: Categories is central to Bard's object model.

Really, though, work on Categories is actually work on Bard. Bard's the real goal.


Comments

plumenator (unauthenticated)
Feb 10, 2010

Is there an implementation I can try?

Tim Daly (unauthenticated)
Feb 10, 2010

Have you looked at Axiom? http://axiom-developer.org

mikel evins
Feb 10, 2010

> Is there an implementation I can try?

Not yet. As I said in the post, I can write simple programs in the CCL-based implementation, but it's not ready for other people yet. Think of it as a car with half the engine stretched out across the workbench. I can start it up and run it, and I can work all the controls and they do the right things, but it's not really drivable yet. Soon. I'm not sure exactly how long, because paid work, and work I'm investing my own money in have to come first.

mikel evins
Feb 10, 2010

> Have you looked at Axiom? http://axiom-developer.org

I hadn't. Thanks for the link!

Martin Kühl (unauthenticated)
Feb 11, 2010

Bard sounds very interesting; thanks for going public with it.

Jason Baker (unauthenticated)
Feb 27, 2010

> but it's not ready for other people yet.

Allow me to play devil's advocate. Have you seen the video "The Myth of the Genius Programmer"? It certainly influenced me a lot on these kinds of issues.

http://code.google.com/events/io/2009/sessions/MythGeniusProgrammer.html

mikel evins
Mar 1, 2010

I'm not hiding Bard away because the unwashed masses are unready for it. There's simply no point in making a language implementation available before you can write programs with it. It's probably not terribly long before I pass that hurdle. I showed the Twin City Lispers the compiler tests that were working last week, and the core logic of the VM. (Lest I confuse anyone, I'm talking about two totally different implementations here; the compiler that is passing tests is implemented on top of Clozure Common Lisp and compiles to native code by way of Lisp; the VM is a separate implementation in Haskell, done for the pure pleasure of seeing what it's like to write very low-level code in a very high-level language.)

When I can write a simple program with it that runs, I'll make it available, after I write a little documentation to explain it.