ELTN (Extended Lua Table Notation) is a structured text format to describe data structures. It fills similar niches to other text formats like XML (1997-2008), YAML (2004-2021), JSON (2006-2017), and TOML (2013-2021). Like JSON it’s a strict subset of a dynamically typed embedded programming language, Lua.
The name Extended Lua Table Notation reflects that the syntax does not simply include Lua tables but a sequence of key-value pairs similar to Lua global variable assignments. Thus one doesn’t have to group the whole document in curly brackets (’{’ … ‘}’).
The author believes ELTN fills a niche between the simplicity of JSON and the readability of YAML and TOML.
Use Cases
Configuration Files
Mainly we intended ELTN as a format for configuration files. This is a portion of my Hugo configuration as TOML:
[markup]
[markup.tableOfContents]
startLevel = 2
endLevel = 5
[markup.highlight]
style = "monokailight"
tabWidth = 4
[markup.goldmark]
[markup.goldmark.renderer]
unsafe = true
[taxonomies]
tag = "tags"
Here’s the equivalent in YAML:
markup:
tableOfContents:
startLevel: 2
endLevel: 5
highlight:
style: "monokailight"
tabWidth: 4
goldmark:
renderer:
unsafe: true
taxonomies:
tag: "tags"
Here’s what it might look like in JSON:
{
"markup": {
"tableOfContents": { "startLevel": 2, "endLevel": 5 },
"highlight": {
"style": "monokailight",
"tabWidth": 4
},
"goldmark": { "renderer": { "unsafe": true }}
},
"taxonomies": { "tag": "tags" }
}
And here’s the equivalent in ELTN:
1markup = {
2 tableOfContents = { startLevel = 2, endLevel = 5 };
3 highlight = {
4 style = "monokailight";
5 tabWidth = 4;
6 };
7 goldmark = { renderer = { unsafe = true }};
8}
9taxonomies = { tag = "tags" }
The ELTN version makes better use of vertical space, and unlike the JSON
version there’s no need to quote strings used as keys. Note also that
commas separate the first keys on line #2 but semicolons separate them
thereafter. That’s mainly a stylistic issue: within a Lua table one can use
either a comma or a semicolon as a separator, and it won’t complain if you
end the last key with a separator. Top level keys markup
and taxonomies
don’t need separators at all, although one can use a semicolon (not a comma!)
if one wants.
Data Persistence
One could also have programs write out ELTN files to persist their data. They or a cooperating program could read them in later. The format is only a little more difficult to parse than JSON, owing to “variable” assignments outside of a table instance and the use of tables for both sequential and associative elements.
Here I’ll borrow an example from Programming in Lua, 3rd edition, by Roberto Ierusalimschy, adapted to ELTN syntax. Let’s say a group of professional programmers keeps a distributed lending library of the books on their shelves.1 Each member uploads an ELTN document that lists of the books they’re willing to lend out that might look like this:
{
member = {
number = 13,
name = "Frank Mitchell",
contact = {
email = "frank.mitchell@nosuchplace.com",
phone = false, -- no phone calls!
pidgin_im = false; -- no IMs!
}
},
books = {
{
author = "Donald E. Knuth",
title = "Literate Programming",
publisher = "CSLI",
year = 1992
},
{
author = "Jon Bentley",
title = "More Programming Pearls",
year = 1990,
publisher = "Addison-Wesley",
},
--[[
... many, many more ...
]]
}
}
Since most of the members of this group are programmers, they write a system that collates all the data. It transforms the ELTN tree to index the data by author and title, counts how many total copies of each book are available, and persists the total library in a big ELTN file, or maybe a set of small ones indexed by the author’s name.
Data Transfer
ELTN can transfer data between processes with embedded Lua interpreters the same way JSON transfers data between a Web server and a browser. Neither side actually has to run Lua; both could parse ELTN.
Let’s continue the example from above. The ELTN request gets translated into entries in a SQL or NoSql database, for reasons, but a Web site allows lenders to search for books. Having found one, our borrower – let’s call him Tom – clicks a button to notify the owner. The system notifies the unsociable member with too many books above (through email, not text). The borrower and lender then meet to exchange the book, and both confirm the exchange took place. The system acknowledges with some boilerplate text and the following bit at the bottom:
{
transaction = 982,
requested = "2023-01-29 18:42 CST",
completed = "2023-02-01 11:29 CST",
lender = {
number = 13,
name = "Frank Mitchell",
comments = "Please don't break the spine."
},
borrower = {
number = 23,
name = "Tom Morrow",
comments = "Gonna read this on the plane to Tokyo!"
},
books = {
{
author = "Donald E. Knuth",
title = "Literate Programming",
publisher = "CSLI",
year = 1992
},
}
}
When Tom returns the book he replies to that e-mail with the ELTN part intact, and the unsociable bookworm e-mails his copy back. Ideally one would only need the transaction number, but by replying with all that information (and maybe more) both parties acknowledge what they’re lending or borrowing. Conversely, if the lender doesn’t get his book back, or gets it with a broken spine, he has a machine readable e-mail trail.
Syntax and Semantics
ELTN syntax is a strict subset of Lua 5.4 syntax, specifically creating tables and assigning values to their keys.
Lexical Conventions
Lexical conventions follow the Lua Manual for 5.4.
By default the lexer treats the input as a sequence of UTF-8 bytes. While parsing other encodings that support ASCII characters is valid, informing the lexer of an alternative encoding is outside the scope of this specification.
Fixed-Length Tokens
The language requires the following fixed-length tokens.
Token | Rules | Meaning |
---|---|---|
; |
fieldsep, stat | separates table fields or top-level definitions |
, |
fieldsep | separates table fields |
= |
stat, field | assignment to a name or explicit table index |
nil |
constant | empty reference2 |
false |
constant | Boolean false |
true |
constant | Boolean true |
{ |
tableconstructor | creates a new table |
} |
tableconstructor | marks the end of a table |
Variable-Length Tokens
To summarize variable-length tokens in ELTN, as used in the formal syntax below:
- identifier (
NAME
) - To quote the Lua 5.4 Manual,
any string of any string of Latin letters, Arabic-Indic digits, and underscores, not beginning with a digit and not being a reserved word.
For backward compatibility, otherwise validNAME
s not siginficant to ELTN but meaningful in Lua are forbidden; see below for a list. - literal string (
STRING_LITERAL
) - Lua literal strings come in two forms.
A short literal string is delimited by single (
'
) or double quotes ("
); a backslash (\
) escapes quote marks within the string and forms C-like escape sequences, detailed below. A long literal string is delimited by the sequences[[
and]]
. Unlike a short literal, any characters within those sequences (save for a newline shortly after the[[
) are considered part of the string, including nested[[
…]]
pairs. (This specification will leave “long brackets” up to the implementor.) - numeric constant (
NUMERAL
) - Numbers follow conventions similar to other programming languages:
a whole or decimal number followed by an
e
orE
and a positive or negative integer exponent, or0x
or0X
followed by a whole or fractional hexadecimal number followed by ap
and a positive or negative hexadecimal exponent. - whitespace (
WS
) - an uninterrupted sequence of non-printable ASCII characters: space (’ ’ or 0x20), form feed (0x0c), newline (0x0a), carriage returns (0x0d), horizontal tab (0x09), or vertical tab (0x0b) Whitespace only serves to separate other tokens and improve readability.
- newline
- CR (0x0d), LF (0x0a), or a CR-LF (0x0d 0x0a) sequence.
- comments
- Comments also come in two forms;
A short comment starts with
--
and runs until the first newline. A long comment starts with--[[
aand runs to the next]]
not matched by an internal[[
, much like long string literals. In both cases, a comment is essentially whitespace in this specification. A future specification may configure parsing behavior from comments.
Reserved Words
For backward compatibility with Lua, the following are reserved words
and invalid as NAME
s:
and break do else elseif end
false for function goto if in
local nil not or repeat return
then true until while
Any non-alphanumeric, non-whitespace character not used in one of the productions above is also invalid, including nearly all those in Lua operators.
Escape Sequences
Escape | Byte(s) | Meaning |
---|---|---|
\a | 0x07 | bell |
\b | 0x08 | backspace |
\f | 0x0c | form feed |
\n | 0x0a | newline |
\r | 0x0d | carriage return |
\t | 0x09 | horizontal tab |
\v | 0x0b | vertical tab |
\\ | \ |
backslash |
\" | " |
quotation mark / double quote |
\’ | ' |
apostrophe / single quote |
\↩︎ | 0x0d | escaped newline3 |
\z | skip to the next non-whitespace character4 | |
\xXX | 0xXX | byte value in hexadecimal digits |
\DDD | 0DDD | byte value in octal digits5 |
\u{XXXX} | utf8( XXXX) 6 |
UTF-8 bytes for code point 0xXXXX.7 |
Parsing
This BNF defines how the tokens above form a valid ELTN document, ignoring whitespace and comments.
document ::= tableconstructor | statlist
statlist ::= ( stat )*
stat ::= var '=' value | ';'
var ::= NAME
value ::= constant | tableconstructor
constant ::= 'nil' | 'false' | 'true' | NUMERAL | STRING_LITERAL
tableconstructor ::= '{' ( fieldlist )? '}'
fieldlist ::= field ( fieldsep field )* ( fieldsep )?
field ::= key '=' value | value
key ::= '[' constant ']' | NAME
fieldsep ::= ',' | ';'
This grammar uses the following notation:
lowercase
- A parser rule, defined elsewhere in the grammar.
document
is the top level rule. NAME
,NUMERAL
, andSTRING_LITERAL
- Lexical symbols described above.
'
…'
- A lexical symbol defined as the exact sequence between the single quotes.
- …
|
… - A separator between alternatives, e.g.
key ::= '[' constant ']' | NAME
means that a key may either be a constant between square brackets or a NAME (identifier). (
…)?
- Zero or one occurence.
(
…)*
- Zero or more occurences.
Streaming ELTN
ELTN could also communicate between processes as a wire protocol. In that case the top part of the grammar might change to this:
stream ::= ( tableconstructor | stat )* EOT
EOT ::=
A tableconstructor sends an asynchronous message from client to server or vice-versa. A future spec might detail an ELTN-RPC analogous to JSON-RPC.
A stat might configure some session state or otherwise preserve values
during the session. We recommend that the exchange of tables represent
application-specific persistent changes, and that the effects of
var =
value statements not persist beyond the session.
In distributed computing, one process maintaining state for another causes
a lot of headaches, only partially solved by expiration dates, keepalive
heartbeats, and session cookies. All that is far beyond the scope of this
document; suffice to say one doesn’t want to bring down a server because
someone decided to make it hold onto a lot of large tables.
Constraints
-
Each var in a document, and each key in a tableconstructor, must be unique within that context. For example:
{ foo = 1, bar = 2, foo = 3 }
is invalid, but this
{ foo = 1, { bar = 2, foo = 3 } }
is valid. The outer table contains the nested table at index
1
. -
This uniqueness constraint applies to implicit keys of values in sequence. So something like
{ “foo”, “bar”, [2] = “baz” }
would also be illegal, since we’ve specified index 2 twice.
Semantics
The atomic values – nil, Boolean true and false, numbers, and strings – stand for themselves.
The tableconstructor rule has two ways to populate its structure:
-
key
=
value, whether that key is aNAME
or a value in square brackets ([
…]
) means to associate the key value with the given value. -
Without a key, the values are stored in sequential indexes: 1, 2, etc. Thus a value like:
{"foo", "bar", "baz", [5] = "foobar"}
is equivalent to
{ [1]="foo", [2]="bar", [3]="baz", [5]="foobar" }
If an ELTN Table doesn’t define a value for a key, the Table should return
nil
2, or its equivalent in the host language.
It’s up to the implementer whether the resulting structure is immutable, and therefore easy to pass around multiple threads, or mutable, and therefore easy to transform or decorate.
The “statement” “;
” does nothing. The Lua grammar allows one to add
as many or as few semicolons as one wants, or none at all.
Type Systems
ELTN derives from a dynamically typed language (Lua) just like JSON (JavaScript). The same solutons for representing JSON in statically typed languages like Java apply to Lua:
-
Special data types that encapsulate the five types of data in an ELTN tree: nil, boolean, number, string, and table.
-
Methods on each node that can convert the ELTN object to familiar host language objects.
-
Converting ELTN tables to maps / hashes / hashtables, lists, sets, bags, etc. will require special care, as most languages differentiate between sequences of objects and associative arrays, while Lua and ELTN do not. Two techniques come to mind:
- Internally Lua stores “small” integer keys in a C array, not in the hashtable structure. The details can be found in the Lua source code, but items listed in a sequence rather than as key-value pairs would ideally go in a sequential structure. Other languages can use the same optimization.
- The implementation can also trawl through the contents of the ELTN Table to see whether it “looks” like a List or Set rather than a Map. E.g. Are the keys all integers? Are the values the same as the keys or Boolean ’true"?
Rejected and Postponed Suggestions
Exclusion of nil
Lua does not distinguish between a table that lacks the requested key
and one whose value is nil
.8 JSON, however, does make such
a distinction because JavaScript does. Lua’s JSON parsers must therefore
include a unique NULL
value for JSON’s null
.
If the host language has simiar semantics to Lua, then, a nil
constant
may not even be necessary. On the other hand, if its native associative
arrays (hash tables, etc.) distinguish between nil
/null
and an
undefined value like JavaScript, then the implementation must also
do so.
On the third hand9, since we intended people, not machines, to writ ELTN
documents, a nil
indicates to the reader that the writer intentionally
did not define a value for that key, even if the resulting data structure
looks the same either way.
At this time the author is unsure which convention to adhere to.
For now we’ll keep nil
in ELTN, and let implementers decide which
idiom makes sense for their language.
For now we’ll leave it in and let language idioms and implementers
decide whether nil
is a distinct value or just the result of a missing key.
Function Call syntax
An earlier version of the grammar included a limited version of Lua’s functioncall production:
stat ::= ';' | var '=' value | functioncall
…
value ::= constant | tableconstructor | functioncall
…
functioncall ::= funcname arg
funcname ::= NAME
arg ::= tableconstructor | STRING_LITERAL
The host program would define all “functions” and provide them to the parser. It was intended as an extension mechanism to allow the following:
-
Converting a string or table to a more convenient internal representation. E.g. date/time strings to datetime objects.
-
Implementing references, especially cyclic and forward references. (This might require extra info in the function prototype.)
-
Configuring multiple instances of the same or similar things, like an old school Spring XML file but without the XML.
-
Interpreting the same data file in different ways, depending on application. An example from an early version of Programming in Lua:
Entry{ author = "Donald E. Knuth", title = "Literate Programming", publisher = "CSLI", year = 1992 } Entry{ author = "Jon Bentley", title = "More Programming Pearls", year = 1990, publisher = "Addison-Wesley", }
Depending on the definition of
Entry
in the program loading the file could upload them to a database, index the data, count the occurences of specific values, or something else, maybe all at once.The file of
Entry
calls was a valid Lua program. As we’ve seen with AJAX, though, running and executing arbitrary JavaScript is a huge security hole, which is why we have JSON.
Difficulties in writing the parser caused us to cut this syntax out. More importantly, though, the functioncall syntax presumes that the parser knows what the function is. Unlike the Tag syntax of YAML the semantics of the data file could depend entirely on what the function calls mean. Instead, for now, we recommend the following:
- Implementers should provide APIs to convert constants or tables into more specific data structures.
- Users should develop conventions to indicate forward, backward, and cyclic references if they’re needed.
- Users should place a sequence of tables within a larger “list-like” table to group similar things.
- Implementers and developers should work on a schema for validating ELTN documents, ideally written in ELTN, just like similar efforts for XML and JSON.
Variable Substitution
A small change to the grammar would enable variable substitutions
value ::= constant | tableconstructor | var
Here var
refers to a NAME
defined in a stat
production.
In a document one can resolve forward and backward references with little
difficulty. During a stream one can only use previous definitions.
While this would give the stat production new meaning in the Streaming protocol, this could also lead to cyclic dependencies which would make the value graph difficult to re-serialize in a way that keeps backward compatibility with Lua. One could limit substitutions to forbidding cyclic graphs, substituting by value instead of by reference, or excluding tables entirely, but that merely complicates parsing.
More seriously it raises the specter of how long a variable declaration lasts. During the lifetime of a session? (Or document.) Tied to a user account, like cookies on the Web? Where would the information reside? This little document shouldn’t introduce new implementation headaches beyond parsing and interpreting a simple alternative to JSON or TOML.
History
The original specification was on the G+ blog, repeated in a post here. After some initial progress on a pure Lua version detailed here the project was suspended.
Recently we’ve begun work on a new Java Pull Parser, detailed below.
Implementations
Work on a Java implementation is ongoing. This is the proposed interface for an ELTN Pull Parser.
See also the JSON Pull Parser.
Event
package com.frank_mitchell.eltnpp;
import java.io.IOException;
public enum EltnEvent {
/**
* Invalid ELTN syntax.
*/
SYNTAX_ERROR,
/**
* Before first ELTN element
*/
START_STREAM,
/**
* Start of ELTN Table (`{`)
*/
START_TABLE,
/**
* End of ELTN Table (`}`)
*/
END_TABLE,
/**
* Simple String key of ELTN object member (`=`)
*/
TABLE_KEY_NAME,
/**
* Start of non-string key in an ELTN table (`[`)
*/
TABLE_KEY_START,
/**
* End of non-string key in an ELTN table (`]=`)
*/
TABLE_KEY_END,
/**
* ELTN nil
*/
VALUE_NIL,
/**
* ELTN Boolean true
*/
VALUE_TRUE,
/**
* ELTN Boolean false
*/
VALUE_FALSE,
/**
* ELTN number
*/
VALUE_NUMBER,
/**
* ELTN string (`"`...`"`)
*/
VALUE_STRING,
/**
* After last ELTN element
*/
END_STREAM
}
Pull Parser
/**
* A pull parser for an ELTN (Extended Lua Table Notation) document.
*
* @author Frank Mitchell
*
* @see https://frank-mitchell.com/projects/eltn/
* @see https://lua.org/
*/
public interface EltnPullParser {
/**
* Checks whether the underlying stream has more ELTN elements.
*
* @return whether the stream has more ELTN elements.
* @throws IOException if the character source could not be read.
*/
public boolean hasNext() throws IOException;
/**
* Advances to the next significant ELTN element in the
* underlying stream.
*
* @throws IOException if the character source could not be read.
*/
public void next() throws IOException;
/**
* Get the event parsed by the most recent call to {@link #next()}.
*
* @return most recently parsed event.
*/
public EltnEvent getEvent();
/**
* Indicates if the enclosing value is a ELTN Table.
*
* If this object is currently processing the contents of a ELTN Table,
* this method will return {@code true}.
*
* @return {@code true} if the enclosing value is a ELTN Table.
*
* @see #isInObject()
*/
public boolean isInTable();
/**
* Indicates if the current value is a key in an ELTN table.
*
* If this object is currently processing the contents of a table key, this
* method will return {@code true}.
*
* @return {@code true} if the enclosing value is a ELTN Table.
*
* @see #isInObject()
*/
public boolean isInKey();
/**
* Gets the value associated with the current event.
*
* On {@link EltnEvent#TABLE_KEY_NAME},
* the result is the ELTN string value for the key.
*
* On {@link EltnEvent#VALUE_STRING},
* the result is the ELTN string value with all escape sequences
* converted to their character values.
*
* On {@link EltnEvent#VALUE_NUMBER}, the result is the string value
* of the number in its original form (decimal or hexadecimal).
*
* On {@link EltnEvent#VALUE_TRUE},
* {@link EltnEvent#VALUE_FALSE},
* or {@link EltnEvent#VALUE_NIL},
* the result is "true", "false", or "nil", respectively.
*
* Otherwise the method throws an exception
*
* @return the string for the current value
*
* @throws IllegalStateException if the current event has no string value.
*/
public String getString();
/**
* Gets the {@link BigDecimal} value associated with the current event.
*
* If {@link #getEvent()} is {@link EltnEvent#VALUE_NUMBER},
* this method returns an unspecified subclass of Number.
* Otherwise this method throws an exception.
*
* @return the value of the current ELTN Number
*
* @throws IllegalStateException if the current event is not a number.
*/
public Number getNumber();
/**
* Gets the {@code double} value associated with the current event.
*
* If {@link #getEvent()} is {@link EltnEvent#VALUE_NUMBER},
* this method returns a {@code double} approximating the number.
* Otherwise this method throws an exception.
*
* @return the value of the current ELTN Number
*
* @throws IllegalStateException if the current event is not a number.
*/
default public double getDouble() throws IllegalStateException {
Number n = getNumber();
if (n == null) {
throw new IllegalStateException("!" + EltnEvent.VALUE_NUMBER);
}
return n.doubleValue();
}
/**
* Gets the {@code int} value associated with the current event.
*
* If {@link #getEvent()} is {@link EltnEvent#VALUE_NUMBER},
* this method returns an {@code int} approximating the number.
* Otherwise this method throws an exception.
*
* @return the value of the current ELTN Number
*
* @throws IllegalStateException if the current event is not a number.
*/
default public int getInt() throws IllegalStateException {
Number n = getNumber();
if (n == null) {
throw new IllegalStateException("!" + EltnEvent.VALUE_NUMBER);
}
return n.intValue();
}
/**
* Gets the {@code long} value associated with the current event.
*
* If {@link #getEvent()} is {@link EltnEvent#VALUE_NUMBER},
* this method returns a {@code long} approximating the number.
* Otherwise this method throws an exception.
*
* @return the value of the current ELTN Number
*
* @throws IllegalStateException
*/
default public long getLong() throws IllegalStateException {
Number n = getNumber();
if (n == null) {
throw new IllegalStateException("!" + EltnEvent.VALUE_NUMBER);
}
return n.longValue();
}
/**
* Gets a {@code boolean} value for the current event.
*
* If {@link #getEvent()} is {@link EltnEvent#VALUE_FALSE} or
* {@link EltnEvent#VALUE_NIL}, this method returns false.
* Otherwise this method returns true.
* This method emulates the convention in Lua that in a Boolean test
* statement, a value of <em>nil</em> or <em>>false</em> counts as false.
*
* @return the Boolean value of the current ELTN object
*/
default public boolean getBoolean() throws IllegalStateException {
EltnEvent event = getEvent();
return event != EltnEvent.VALUE_NIL && getEvent() != EltnEvent.VALUE_FALSE;
}
}
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
An idea I had when I had similarly bookish co-workers. Sadly something this formal would be overkill now. ↩︎
-
That is, the newline after the slash becomes part of the string. ↩︎
-
To quote Ierusalimschy et al. verbatim,
The escape sequence ‘\z’ skips the following span of whitespace characters, including line breaks; it is particularly useful to break and indent a long literal string into multiple lines without adding the newlines and spaces into the string contents. A short literal string cannot contain unescaped line breaks nor escapes not forming a valid escape sequence.
↩︎ -
The sequence may have only one or two octal digits unless the following character(s) are also digits, to avoid ambiguity. ↩︎
-
That is, a function that encodes a Unicode code point into a two or three (or four) byte UTF-8 sequence. ↩︎
-
The enclosing brackets are mandatory. The code can specify any number of hexadecimal digits, from one to four (or more!). ↩︎
-
Internally it probably removes the key if the value is
nil
. ↩︎