CHANGED 2025-05-11:
- Preserved old version in ELTN Past.
- Incremented version to 0.6
- Fixed missing text under Streaming ELTN.
- Removed “changed” notes in the text.
- Removed sections on rejected features.
- Removed links to “in-progress” implementations.
- Ran through spell-checker.
CHANGED 2025-05-13:
- Added line numbers to all examples in use cases.
- Added two missing fixed-length tokens.
CHANGED 2025-05-18:
- Renamed “var” to “def”, as in definition.
- Remove dependency on UTF-8. Parsers should handle any encoding based on ASCII, and pass any unfamiliar bytes in strings and comments straight through.
ELTN (Extended Lua Table Notation) is a structured text format to describe data structures. It fills similar niches to other text formats like XML (1997-2008), YAML (2004-2021), JSON (2006-2017), and TOML (2013-2021). Like JSON it’s a strict subset of a dynamically typed embedded programming language, Lua.
The name Extended Lua Table Notation reflects that the syntax does not simply include Lua tables but a sequence of key-value pairs similar to Lua global variable assignments. Thus one doesn’t have to group the whole document in curly brackets (’{’ … ‘}’).
The author believes ELTN fills a niche between the simplicity of JSON and the readability of YAML and TOML.
Use Cases
Configuration Files
Mainly we intended ELTN as a format for configuration files. This is a portion of my Hugo configuration as TOML:
1[markup]
2 [markup.tableOfContents]
3 startLevel = 2
4 endLevel = 5
5 [markup.highlight]
6 style = "monokailight"
7 tabWidth = 4
8 [markup.goldmark]
9 [markup.goldmark.renderer]
10 unsafe = true
11
12[taxonomies]
13 tag = "tags"
Here’s the equivalent in YAML:
1markup:
2 tableOfContents:
3 startLevel: 2
4 endLevel: 5
5 highlight:
6 style: "monokailight"
7 tabWidth: 4
8 goldmark:
9 renderer:
10 unsafe: true
11
12taxonomies:
13 tag: "tags"
Here’s what it might look like in JSON:
1{
2"markup": {
3 "tableOfContents": { "startLevel": 2, "endLevel": 5 },
4 "highlight": {
5 "style": "monokailight",
6 "tabWidth": 4
7 },
8 "goldmark": { "renderer": { "unsafe": true }}
9},
10"taxonomies": { "tag": "tags" }
11}
And here’s the equivalent in ELTN:
1markup = {
2 tableOfContents = { startLevel = 2, endLevel = 5 };
3 highlight = {
4 style = "monokailight";
5 tabWidth = 4;
6 };
7 goldmark = { renderer = { unsafe = true }};
8}
9taxonomies = { tag = "tags" }
The ELTN version makes better use of vertical space, and unlike the JSON
version there’s no need to quote strings used as keys. Note also that
commas separate the first keys on line #2 but semicolons separate them
thereafter. That’s mainly a stylistic issue: within a Lua table one can use
either a comma or a semicolon as a separator, and it won’t complain if you
end the last key with a separator. Top level keys markup
and taxonomies
don’t need separators at all, although one can use a semicolon (not a comma!)
if one wants.
Data Persistence
One could also have programs write out ELTN files to persist their data. They or a cooperating program could read them in later. The format is only a little more difficult to parse than JSON, owing to “definitions” outside of a table instance and the use of tables for both sequential and associative elements.
Here I’ll borrow an example from Programming in Lua, 3rd edition, by Roberto Ierusalimschy, adapted to ELTN syntax. Let’s say a group of professional programmers keeps a distributed lending library of the books on their shelves.1 Each member uploads an ELTN document that lists of the books they’re willing to lend out that might look like this:
{
member = {
number = 13,
name = "Frank Mitchell",
contact = {
email = "frank.mitchell@nosuchplace.com",
phone = false, -- no phone calls!
pidgin_im = false; -- no IMs!
}
},
books = {
{
author = "Donald E. Knuth",
title = "Literate Programming",
publisher = "CSLI",
year = 1992
},
{
author = "Jon Bentley",
title = "More Programming Pearls",
year = 1990,
publisher = "Addison-Wesley",
},
--[[
... many, many more ...
]]
}
}
Since most of the members of this group are programmers, they write a system that collates all the data. It transforms the ELTN tree to index the data by author and title, counts how many total copies of each book are available, and persists the total library in a big ELTN file, or maybe a set of small ones indexed by the author’s name.
Data Transfer
ELTN can transfer data between processes with embedded Lua interpreters the same way JSON transfers data between a Web server and a browser. Neither side actually has to run Lua; both could parse ELTN.
Let’s continue the example from above. The ELTN request gets translated into entries in a SQL or NoSql database, for reasons, but a Web site allows lenders to search for books. Having found one, our borrower – let’s call him Tom – clicks a button to notify the owner. The system notifies the unsociable member with too many books above (through email, not text). The borrower and lender then meet to exchange the book, and both confirm the exchange took place. The system acknowledges with some boilerplate text and the following bit at the bottom:
{
transaction = 982,
requested = "2023-01-29 18:42 CST",
completed = "2023-02-01 11:29 CST",
lender = {
number = 13,
name = "Frank Mitchell",
comments = "Please don't break the spine."
},
borrower = {
number = 23,
name = "Tom Morrow",
comments = "Gonna read this on the plane to Tokyo!"
},
books = {
{
author = "Donald E. Knuth",
title = "Literate Programming",
publisher = "CSLI",
year = 1992
},
}
}
When Tom returns the book he replies to that e-mail with the ELTN part intact, and the unsociable bookworm e-mails his copy back. Ideally one would only need the transaction number, but by replying with all that information (and maybe more) both parties acknowledge what they’re lending or borrowing. Conversely, if the lender doesn’t get his book back, or gets it with a broken spine, he has a machine readable e-mail trail.
Syntax
ELTN syntax is a strict subset of Lua 5.4 syntax, specifically creating tables and assigning values to their keys.
Lexical Conventions
Lexical conventions follow the Lua Manual for 5.4.
A parser should treat the input as a sequence of ASCII bytes. Bytes outside the ASCII character set, i.e. any bytes above hex code 0x7F, are only allowed in strings and comments, but all parsers should simply pass them right through without further interpretation. This allows parsers to handle a wide range of character sets, including UTF-8, without dependencies on particular character sets.
Fixed-Length Tokens
The language requires the following fixed-length tokens.
Token | Rules | Meaning |
---|---|---|
; |
fieldsep, deflist | separates table fields or top-level definitions |
, |
fieldsep | separates table fields |
= |
stat, field | assignment to a name or explicit table index |
nil |
constant | empty reference |
false |
constant | Boolean false |
true |
constant | Boolean true |
{ |
tableconstructor | creates a new table |
} |
tableconstructor | marks the end of a table |
[ |
key | precedes a non-identifier key |
] |
key | follows a non-identifier key |
Variable-Length Tokens
To summarize variable-length tokens in ELTN, as used in the formal syntax below:
- identifier (
NAME
) - To quote the Lua 5.4 Manual,
any string of any string of Latin letters, Arabic-Indic digits, and underscores, not beginning with a digit and not being a reserved word.
For backward compatibility, otherwise validNAME
s not significant to ELTN but meaningful in Lua are forbidden; see below for a list. - literal string (
STRING_LITERAL
) - Lua literal strings come in two forms.
A short literal string is delimited by single (
'
) or double quotes ("
); a backslash (\
) escapes quote marks within the string and forms C-like escape sequences, detailed below. A long literal string is delimited by the sequences[[
and]]
. Unlike a short literal, any characters within those sequences (save for a newline shortly after the[[
) are considered part of the string, including nested[[
…]]
pairs. (This specification will leave “long brackets” up to the implementer.) - numeric constant (
NUMERAL
) - Numbers follow conventions similar to other programming languages:
a whole or decimal number followed by an
e
orE
and a positive or negative integer exponent, or0x
or0X
followed by a whole or fractional hexadecimal number followed by ap
and a positive or negative hexadecimal exponent. - whitespace (
WS
) - an uninterrupted sequence of non-printable ASCII characters: space (’ ’ or 0x20), form feed (0x0c), newline (0x0a), carriage returns (0x0d), horizontal tab (0x09), or vertical tab (0x0b) Whitespace only serves to separate other tokens and improve readability.
- newline
- CR (0x0d), LF (0x0a), or a CR-LF (0x0d 0x0a) sequence.
- comments
- Comments also come in two forms;
A short comment starts with
--
and runs until the first newline. A long comment starts with--[[
and runs to the next]]
not matched by an internal[[
, much like long string literals. In both cases, a comment is essentially whitespace in this specification. A future specification may configure parsing behavior from comments.
Reserved Words
For backward compatibility with Lua, the following are reserved words
and invalid as NAME
s:
and break do else elseif end
false for function goto if in
local nil not or repeat return
then true until while
Any non-alphanumeric, non-whitespace character not used in one of the productions above is also invalid, including nearly all those in Lua operators.
Escape Sequences
Escape | Byte(s) | Meaning |
---|---|---|
\a | 0x07 | bell |
\b | 0x08 | backspace |
\f | 0x0c | form feed |
\n | 0x0a | newline |
\r | 0x0d | carriage return |
\t | 0x09 | horizontal tab |
\v | 0x0b | vertical tab |
\\ | \ |
backslash |
\" | " |
quotation mark / double quote |
\’ | ' |
apostrophe / single quote |
\↩︎ | 0x0d | escaped newline2 |
\z | skip to the next non-whitespace character3 | |
\xXX | 0xXX | byte value in hexadecimal digits |
\DDD | 0DDD | byte value in octal digits4 |
\u{XXXX} | utf8( XXXX) 5 |
UTF-8 bytes for code point 0xXXXX.6 |
Parsing
This BNF defines how the tokens above form a valid ELTN document, ignoring whitespace and comments.
document ::= tableconstructor | deflist
deflist ::= ( definition | ';' )*
definition ::= defname '=' value
defname ::= NAME
value ::= `nil` | constant | tableconstructor
constant ::= 'false' | 'true' | NUMERAL | STRING_LITERAL
tableconstructor ::= '{' ( fieldlist )? '}'
fieldlist ::= field ( fieldsep field )* ( fieldsep )?
field ::= key '=' value | value
key ::= '[' constant ']' | NAME
fieldsep ::= ',' | ';'
This grammar uses the following notation:
lowercase
- A parser rule, defined elsewhere in the grammar.
document
is the top level rule. NAME
,NUMERAL
, andSTRING_LITERAL
- Lexical symbols described above.
'
…'
- A lexical symbol defined as the exact sequence between the single quotes.
- …
|
… - A separator between alternatives, e.g.
key ::= '[' constant ']' | NAME
means that a key may either be a constant between square brackets or a NAME (identifier). (
…)?
- Zero or one occurrence.
(
…)*
- Zero or more occurrences.
Streaming ELTN
ELTN could also communicate between processes as a wire protocol. In that case the top part of the grammar might change to this:
stream ::= ( tableconstructor | definition )* EOF
EOF ::= <stream-closed>
A tableconstructor sends an asynchronous message from client to server or vice-versa. A future spec might detail an ELTN-RPC analogous to JSON-RPC.
A definition might configure some session state or otherwise preserve values
during the session. We recommend that the exchange of tables represent
application-specific persistent changes, and that the effects of
def =
value statements not persist beyond the session.
In distributed computing, one process maintaining state for another causes
a lot of headaches, only partially solved by expiration dates, keep-alive
heartbeats, and session cookies. All that is far beyond the scope of this
document; suffice to say one doesn’t want to bring down a server because
someone decided to make it hold onto a lot of large tables.
Constraints
-
Each def in a document, and each key in a tableconstructor, must be unique within that context. For example:
{ foo = 1, bar = 2, foo = 3 }
is invalid, but this
{ foo = 1, { bar = 2, foo = 3 } }
is valid. The outer table contains the nested table at index
1
. -
This uniqueness constraint applies to implicit keys of values in sequence. So something like
{ "foo", "bar", [2] = "baz" }
would also be illegal, since we’ve specified index 2 twice.
Semantics
The atomic values – nil, Boolean true and false, numbers, and strings – stand for themselves.
The tableconstructor rule has two ways to populate its structure:
-
key
=
value, whether that key is aNAME
or a value in square brackets ([
…]
) means to associate the key value with the given value. -
Without a key, the values are stored in sequential indexes: 1, 2, etc. Thus a value like:
{"foo", "bar", "baz", [5] = "foobar"}
is equivalent to
{ [1]="foo", [2]="bar", [3]="baz", [5]="foobar" }
If an ELTN Table doesn’t define a value for a key, the Table should return
nil
, or its equivalent in the host language.
It’s up to the implementer whether the resulting structure is immutable, and therefore easy to pass around multiple threads, or mutable, and therefore easy to transform or decorate.
The “statement” “;
” does nothing. The Lua grammar allows one to add
as many or as few semicolons as one wants, or none at all.
Type Systems
ELTN derives from a dynamically typed language (Lua) just like JSON (JavaScript). The same solutions for representing JSON in statically typed languages like Java apply to Lua:
-
Special data types that encapsulate the five types of data in an ELTN tree: nil, boolean, number, string, and table.
-
Methods on each node that can convert the ELTN object to familiar host language objects.
-
Converting ELTN tables to maps / hashes / hash-tables, lists, sets, bags, etc. will require special care, as most languages differentiate between sequences of objects and associative arrays, while Lua and ELTN do not. Two techniques come to mind:
- Internally Lua stores “small” integer keys in a C array, not in the hash table structure. The details can be found in the Lua source code, but items listed in a sequence rather than as key-value pairs would ideally go in a sequential structure. Other languages can use the same optimization.
- The implementation can also trawl through the contents of the ELTN Table to see whether it “looks” like a List or Set rather than a Map. E.g. Are the keys all integers? Are the values the same as the keys or Boolean ’true"?
-
An idea I had when I had similarly bookish co-workers. Sadly something this formal would be overkill now. ↩︎
-
That is, the newline after the slash becomes part of the string. ↩︎
-
To quote Ierusalimschy et al. verbatim,
The escape sequence ‘\z’ skips the following span of whitespace characters, including line breaks; it is particularly useful to break and indent a long literal string into multiple lines without adding the newlines and spaces into the string contents. A short literal string cannot contain unescaped line breaks nor escapes not forming a valid escape sequence.
↩︎ -
The sequence may have only one or two octal digits unless the following character(s) are also digits, to avoid ambiguity. ↩︎
-
That is, a function that encodes a Unicode code point into a two or three (or four) byteiesUTF-8 sequence. ↩︎
-
The enclosing brackets are mandatory. The code can specify any number of hexadecimal digits, from one to four (or more!). ↩︎