A (Revised^2) Report on SLAN: Scheme List-Atom Notation

CHANGES (2025-06-13): Since R1:

Removed restrictions of character encodings, in line with ELTN; any bytes outside the ASCII range remain unchanged.
Converted the formal grammar to official EBNF.
Removed rejected proposals.
Moved Java APIs to a new page.
Replace Lua string escapes with SRFI-75.

CHANGES (2025-06-14): spell-check.

Introduction

SLAN is the Scheme List-Atom Notation¹, because I really wanted the acronym SLAN. Like XML, YAML, JSON, TOML, and my own ELTN it’s a language for describing data structures, not performing calculations. In SLAN everything is a list of “atoms” – strings, numbers, “symbols”, and booleans – and other lists.

SLAN somewhat resembles the programming language Scheme, but even smaller: entire data types and constructions have been cut out to make parsing simpler.

History

Since the invention of LISP in 1960 and Scheme around 1975 programmers have used S-expressions – space separated words surrounded by parentheses – as a simple file format. SXML writes XML using S-expressions.

While contemplating how to serialize what’s essentially a rules engine² as a Java .properties file, I realized I had other choices. The project I’m working on parses JSON, so to avoid infinite regress I decided on something simpler. Thus SLAN was born.

File Format

The SLAN file format consists entirely of ASCII characters representing Scheme-like lists. As a simple serialization format (not a mathematical programming language) it can represent only a few basic data types: strings, “symbols” (basically strings without quotes), simple decimal numbers, booleans, and of course lists.

Parsers should allow non-ASCII characters in a string constant or in comments, but not elsewhere in the file. Such characters will pass through the parser unchanged and uninterpreted. Parsers should also look for the Unicode “byte order mark” to verify they are parsing a byte encoding and not UTF-16 or UTF-32.

The intent is that SLAN avoids the complexity of character encodings by staying compatible with 1960s technology, while remaining current with those newer standards.

Comments

Following Scheme R6RS, SLAN regards the following as comments:

text outside a string that starts with ; and runs to the end of the line.
text outside a string starting with #| and ending with |#.

The parser will treat comment text as whitespace.

Data Types

The following are the only data types in SLAN.

List

Lists are sequences of other data types, including nested lists. SLAN doesn’t care whether one implements lists as linked cons cells, arrays, or an arbitrary data structure that’s simply serialized as nested lists.

Symbol

In SLAN symbols and strings are two ways of writing a sequence of characters, one more restrictive than the other.

Nevertheless, parsers should indicate whether a value is a symbol or a string, in case the application attaches semantic meaning to one of the other. Applications can use a “symbol” like a C or Java enum, the name of a datatype, the name of a (remote?) procedure call, or any other meaning programmers give to names. That said, the current specification has no mechanism to verify symbols against a list of “known” names; the application must do that.

String

After Lists, Strings are the main data type in SLAN. As SLAN exists to serialize and deserialize data structures, a String could denote an arbitrary character sequence, a specific data structure like a date, or even a complicated number.

Escape Sequences

SRFI 75 defines the following escape sequences for strings:

Escape	Byte(s)	Meaning
\a	0x07	bell
\b	0x08	backspace
\t	0x09	horizontal tab
\n	0x0a	newline
\v	0x0b	vertical tab
\f	0x0c	form feed
\r	0x0d	carriage return
\"	`"`	quotation mark / double quote
\'	`'`	apostrophe / single quote
\\	`\`	backslash
\␤		skip newline and following whitespace
\xXX	0xXX	byte value in hexadecimal digits
\uXXXX	`utf8(`XXXX`)`	UTF-8 bytes for code point 0xXXXX.
\UXXXXXXXX	`utf8(`XXXXXXXX`)`	UTF-8 bytes for code point 0xXXXXXXXX.

Escaping a newline skips over the newline and any whitespace to the next printable character:

"This is a string with a newline in it. \
        ... oh, wait, it's not."

becomes “This is a string with a newline in it. … oh, wait, it’s not.”

The Unicode escapes \\uXXXX and \\UXXXXXXXX includes all valid Unicode code points, i.e. 0x01 through 0x10FFFF, excluding the surrogates 0xD800 through 0xDFFF.

Number

Numbers in SLAN resemble those in JSON: a whole part, an optional fractional part, and an optional exponent. That’s it. Nothing like a Scheme number.

Boolean

A basic true-or false value.

Empty List

In Scheme the Empty List has special status. Scheme implements lists as linked lists of “cons” cells, and the empty list is effectively a null pointer. In SLAN we continue the tradition of Empty List being the equivalent of nil or null in other languages. It’s the only value equivalent to Boolean false; all others are true.

Syntax

Below is the formal syntax of SLAN. This is the notation in use:

lower_case: A reference to a grammar rule, defined with the symbol =.
SOME WORDS: A description of a rule in plain English.
"x": A literal character or sequence of characters.
"\x": An escape sequence denoting a non-printable or confusing character. A literal " is written as "\"; a literal \ is written as "\\". See Escape Sequences, above.
x , y: A sequence containing both x and y.
x | y: Either x or y.
[ x ]: Zero or one of x.
{ x }: Zero or more of x.
( x ): A group of items that are treated as a unit.

For example "y", [ "a" | "b" ] , "z" means a sequence of zero or more “a"s or “b"s, starting with a “y” and ending with a “z”: “yz” “yaz”, “ybz”, “yaaz”, “ybaz”, etc.

Streams, Lists, and Values

stream      = ws , list , { ws , list } , ws ;

list        = "(" , ws , value , { reqws , value } , ws , ")" ;

value       = list | emptylist | symbol | string | number | boolean ;

emptylist   = "(" , ws , ")" ;

Symbols

symbol      = ( ichar , { schar } ) | nchar ;

ichar       = letter | "!" | "$" | "%" | "&" | "*" | "/" | ":"
                 | "<" | "=" | ">" | "?" | "~" | "_" | "^" ;

nchar       = "." | "+" | "-" ;

schar       = ichar | digit | nchar ;

letter      = "A" THROUGH "Z" | "a" THROUGH "z" ;

digit       = "0" THROUGH "9" ;

Strings

string      = dquo , { char | "\\" escape } , dquo ;

dquo        = """" ;    (* literal double quote *)

char        = ANY CHARACTER BUT A DOUBLE QUOTE OR BACKSLASH ;

escape      = "'" | dquo | "\\"
                 | "a" | "b" | "f" | "n" | "r" | "t" | "v" 
                 | newline , { whitespace }
                 | "x" , hexdigit , hexdigit 
                 | "u" , hexdigit , hexdigit , hexdigit , hexdigit
                 | "U" , hexdigit , hexdigit , hexdigit , hexdigit ,
                         hexdigit , hexdigit , hexdigit , hexdigit ;

hexdigit    = "0" THROUGH "9" | "A" THROUGH "F" | "a" THROUGH "f";

Numbers

number      = [ sign ] , whole , [ decimal ] , [ exponent ]
                | [ sign ] , [ whole ] , decimal , [exponent]
                | [ sign ] , [ whole ] , "/" , positive
                | "0/0" | "+1/0" | "-1/0" ;     (* NaN, +Infinity, -Infinity *)

sign        = "+" | "-" ;

whole       = "0" | positive ;

positive    = ( "1" THROUGH "9" ) , { digit } ;

decimal     = "." , digit , { digit } ;

exponent    = ( "e" | "E" ) , [ sign ] , digit , {digit} ;

Booleans

boolean     = "#t" | "#f" ;

Whitespace

reqws       = ( whitespace | comment ) , ws ;

ws          = { whitespace | comment } ;

whitespace  = " " | "\f" | "\t" | "\r" | "\n" | "\v" ;

comment     = ( ";" , NOT A NEWLINE , newline )
                | ( "#|" , NOT A CLOSE-COMMMENT , "|#" ) ;

newline     = "\r" | "\n" | "\r\n" ;

Semantic Concerns

Because the definition of Empty List includes an empty list, parsers must aggressively search for either their first element or their closing parenthesis. This complicates parsing a bit, but it’s the only consistent way to treat empty lists.
#f and () are interpreted as “false”, if they’re interpreted. Everything else counts as “true”.

Formerly the “Scheme-Like Abridged Notation”, but this sounds much better and more descriptive. JSON has Objects (and Arrays), ELTN has Tables, SLAN has Lists … and Atoms. ↩︎
Specifically how to choose among Java classes to wrap an object based on the signature(s) of the wrapper’s constructor(s). For extra credit, imagine the user offers up multiple arguments, e.g. (byte[], int, int). I think I implemented something like this for Rhino back in the day, but I can’t remember the details. ↩︎