Wise authors in many fields have extolled the virtues of simplicity, and the difficulty of achieving them. There is "Einstein's razor, often paraphrased as "Make everything as simple as possible, but not simpler." The Revised4 Report on the Algorithmic Language Scheme admonishes, "Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary." Pascal once wrote to a friend that he had "made this longer because I have not had the time to make it shorter."
One advantage of that approach is that an expression that would be a list made of cons cells in an older Lisp may instead be an instance of some other sequence type that has convenient ways to store metadata about the expression. This happy quality addresses an issue pointed out in several places by David Moon: conventional s-expressions don't have a convenient place to store such metadata, which complicates building development tools. With older Lisps, you have to hang a bag on the side of your syntax trees to keep track of things like where the source code resides and so forth. Bard is free to choose a representation for source code that has storage for those sorts of things.
(x y z) means "apply the value of x to the values of y and z." This is the familiar Lisp list notation, or, more specifically, the notation used by Scheme for literal lists.
[x y z] means "construct a sequence consisting of the values of x, y, and z in that order." I always regretted that there wasn't a simpler notation for constructing a list that you don't intend to treat as a function call.
Everything is Bard is an atom or a collection. An atom is a single non-composite value, such as an integer or a Boolean value.
Many atoms are familiar: integers and floats; true and false; the null value (which is presently named "nothing" in Bard). Cells are less familiar. A cell is an atom that contains another value. You can retrieve the cell's value by calling the value function. Cells are usually read-only, but you can create mutable cells. If a cell is mutable, you can change its value using the set! function.
A collection is a value that has component elements, like a sequence.
Collections map keys to values. You can extract a component of a collection by applying the collection to the appropriate key:
A sequence is a collection that maps the natural numbers to elements. Some collections map other keys to values; for example:
( { name: "Fred" age: 47 shape: 'round } 0) => [name: "Fred"]
( { name: "Fred" age: 47 shape: 'round } 1) => [age: 47]
Not all collections support both protocols, though.
Bard's collections include a few things that aren't usually thought of as collections. Functions are collections: they support the Mapping protocol. Modules are collections: they map names to cells.
Types
Bard's system of types occupies most of my time when I'm working on the language. It's the most interesting area to me, and the one with the toughest problems.
There are a bunch of straightforward datatypes: undefined, nothing, true and false, characters, integers and floats.
Sequences and maps are slightly more complicated, being aggregates, but the main work with them is just defining the Sequencing and Mapping protocols.
Then there are functions, modules, and streams--streams are treated as a sort of special kind of collection--basically sequences of (potentially) infinite length.
Text is what most languages call strings. I call them text in Bard because "string" usually connotes some specific representation, and like most common types in Bard, text is an abstract type. It might have several different representations. A text is a sequence of characters.
Names are what older Lisps call symbols and keywords. A name is an atom that is denoted by a text, like foo or Kalamazoo. Names are collected into modules. There are three ways to write a name, depending on what module you want to refer to:
bard.lang:foo means "the name 'foo' in the 'bard.lang' module". Each module has a unique name. If you write a name by prepending the name of a module on it, the resulting atom means that name in the given module. The module namespace is global; in other words, 'bard.lang' will always refer to the same module.
:foo means "the name 'foo' in the anonymous module." The name of the anonymous module is the empty text. Names in the anonymous module are what some older Lisps call "keywords": they are names that always evaluate to themselves:
:foo => :foo
By the way, for the sake of aesthetics, you can also write ":foo" as "foo:", when the name is in the anonymous module. Sometimes it's nice to be able to do this when you're using a keyword to label something, as, for example, in a map:
{ name: "Fred" age: 101 }
foo means, "the name 'foo' in the current module." All Bard code has a current module that determines the namespace it's using. If you write a name without any qualification (a "bare" name) then Bard treats it as a name in the current module.
Besides these straightforward datatypes, there are two more interesting areas: support for user-defined types, and support for functions that are polymorphic over the types of their arguments.
Bard provides four kinds of user-defined types:
Synonyms are simple new names for existing types. Synonyms are just a naming convenience. Often you can reuse an existing type, but your code will be clearer if you can give a name to the type that indicates its intended use. The type <bank-balance> is more informative than the type <float>.
Vectors are named sequences. You can optionally specify a type restriction on the elements that go in your vector type, and you can optionally set a minimum and maximum count.
Records are named maps. You can optionally restrict the types of a record's keys and values, and you can optionally specify that no keys may be added besides the ones enumerated in the definition.
Protocols are sets of operations that in turn define abstract types. Most of the work I've been doing on Bard is centered on making protocols the primary way to define types. Vectors and records are representations--that is, concrete ways to lay out data. Representations are not really types; they're just arrangements of storage. A real type is defined by what you can do with data, and that's what protocols are all about.
Here's a simple protocol:
(protocol JSONSerialization
[(toJSON <Value>) => <JSONString>]
[(fromJSON <JSONString>) => <Value>])
(Yes, protocols declare return types.)
The protocol expression creates a protocol object, two functions, and two categories.
The protocol object's purpose is primarily organizational and diagnostic. It's useful to be able to ask Bard what protocols exist, and what their functions and categories are. Bard can also warnyou when protocols lack implementations, or when methods return the wrong types.
The functions are generic and abstract. They have no implementations, because we haven't given any yet. We use define-method for that.
Similarly, the categories are completely abstract types with no representations. We haven't told Bard what representations can be considered members of <Value> or <JSONString>.
The simplest way to tell Bard what representations are members of a category is to explicitly say it:
(categories <ascii-string> [<JSONString>])
The above expression asserts that the representation <ascii-string> is a member of the category <JSONString>. The categories special form has three forms:
(categories) returns a sequence of all known categories.
(categories x) returns a sequence of all categories of which x is a member. The returned sequence is stably-ordered--that is, it will always be returned in the same order, unless you explicitly do something to change it.
(categories x [y*]) asserts that x is a member of each category listed in [y*]. Furthermore, it asserts that the categories y* appear in the order given in the expression. Subsequent calls to (category x) will return those categories in the same order.
Bard uses the results of the categories operator to determine how to dispatch polymorphic functions. Suppose, for example that we call:
(foo "blue")
Now suppose that "blue" is an <ascii-string>, and suppose further that <ascii-string> is a member of <Name> and also a member of <Quality>. If there are methods for foo on both <Name> and <Quality>, how does Bard know which method to call? Simple: if (categories <ascii-string>) returns [<Name> <Quality>], then it choose the <Name> method, because <Name> appears first in the returned sequence.
More complicated sets of categories require a more complicated analysis, but the analysis is deterministic (Bard presently uses the C3 class-linearization method, because it has gotten good results in practice, and because I understand it).
Explicitly asserting membership in a category isn't the only way to establish category membership. define-method implicitly calls the categories operator:
(define-method toJSON ((x <ascii-string>))...)
This define-method form, besides giving an implementation of the method foo for the representation <ascii-string>, also has the effect of calling categories like this:
(categories <ascii-string> [<Value>])
In other words, it asserts that <ascii-string> is a member of <Value>.
I still have some implications to work out. This scheme provides convenient ways to define protocols and methods, and to bind representations to categories. It provides deterministic way to order dispatch that gets results that are mainly what a programmer would expect.
Calling categories more than once with the same first argument has a well-defined meaning:
(categories <ascii-string> [<Name>])
(categories <ascii-string> [<Quality>])
means the same things as
(categories <ascii-string> [<Name> <Quality>])
and that means that separate define-method forms also have a well-defined and understandable meaning. In addition, the form
(categories x)
also gives you a way to reliably determine what the dispatch order is.
The question that is currently bugging me is this: what does this mean?
(categories x [<Cat1> <Cat2> <Cat3>])
(categories x [<Cat2> <Cat1> <Cat4>])
Does the second expression completely replace the first? If so, it means that x is no longer a member of <Cat3>. That might be unexpected. But if not, then how do we resolve the fact that the two expressions disagree about the precedence of <Cat1> and <Cat2>?
I don't yet know the answer.