Objectives
Extended Lua Table Notation (ELTN) provides a text representation of data drawn from the Lua programming language in the same way JSON draws from JavaScript / ECMAScript. The author envisions ELTN as an alternative to TOML for configuration and manifest files, but ELTN could also play a role in data exchange similar to JSON in exchanging data over HTTP or raw sockets.
Related Work
As a strict subset of the Lua programming language, ELTN owes a great deal to Lua’s authors, Roberto Ierusalimschy, Waldemar Celes, and Luiz Henrique de Figueiredo, as well as its many contributors and maintainers over decades.
ELTN also draws inspiration from LuaRocks, Lua’s semi-official package manager, and its rockspec format which is essentially Lua code. The author hopes to provide a more general solution for Lua-like data formats.
TOML, YAML, and JSON provide similar data representations to ELTN: a directed graph of sequences and associative arrays with leaf nodes representing text, numbers, boolean values, and other discrete data. The author regards ELTN as simpler for both humans and machines to read than YAML, but not quite as verbose as JSON. Unlike TOML, ELTN represents data hierarchies directly like YAML and JSON.
XML first started the trend of standardized, simple data exchange formats. Over the years it acquired the same sort of standards bloat as binary protocols like CORBA, which led to the current generation of “lightweight” data exchange formats.
Character Set
ELTN is a text file format. Characters must be encoded using ASCII or a format consistent with ASCII (1967 onward).
Certain 7-bit characters play a crucial role in defining the ELTN format:
Bytes (hex) | Characters | Significance |
---|---|---|
0x09 | horizontal tab | whitespace |
0x0A | line feed | whitespace, end of a short comment |
0x0D | carriage return | whitespace, end of a short comment |
0x20 | space | whitespace |
0x22 | " |
start/end of a quoted string |
0x27 | ' |
start/end of a quoted string |
0x2B | + |
part of a numeric constant |
0x2C | , |
separates entries in a table |
0x2D | - |
part of a numeric constant or comment |
0x30-0x39 | 0 … 9 |
part of a numeric constant or identifier |
0x3B | ; |
separates entries in a table or definition list |
0x3D | = |
definition or table key assignment |
0x41-0x5A | A … Z |
part of an identifier |
0x5B | [ |
start of a table key, long string, or long comment |
0x5C | \ |
escapes special characters in a quoted string |
0x5D | ] |
end of a table key, long string, or long comment |
0x5F | _ |
part of an identifier |
0x61-0x7A | a … z |
part of an identifier or keyword |
0x7B | { |
start of a table |
0x7D | } |
end of a table |
Bytes with the 8th bit set (0x80-0xFF) must only appear in strings and comments. ELTN parsers should pass them through without interpretation.
Lexical Elements
The syntax of ELTN ultimately comes from the Lua language, as described in the Lua 5.4 manual. Any ambiguity in the author’s descriptions are solely the fault of the author. If any questions about implementing ELTN remain, use the Lua 5.4 interpreter as a guide to legal syntax.
Comment
A comment takes two forms, short and long.
A “short comment” runs from a --
to the end of the line.
name = "value" -- this is a short comment that ends here.
-- this is another short comment.
A long comment may run from the sequence --[[
to a matching ]]
,
or --[=[
to ]=]
, or --[==[
to ]==]
, and so ow.
Long comments use the same rules as Long Strings, below.
--[==[
This comment runs several lines, and may contain any characters.
It may even contain [[this]] or [===[this]===].
It ends only after the text includes these:
]==]
name = "value"
Parsers may skip comments as mere whitespace, or provide them to their caller as added information.
Fixed-Length Tokens
The language requires the following fixed-length tokens.
Token | Meaning |
---|---|
; |
separates table fields or top-level definitions |
, |
separates table fields |
= |
assignment to an identifier or table key |
nil |
a reference to nothing |
false |
Boolean false |
true |
Boolean true |
{ |
creates a new table |
} |
marks the end of a table |
[ |
precedes a non-identifier key |
] |
follows a non-identifier key |
Reserved Keywords
For compatibility with Lua, ELTN reserves the following keywords, which are not available as identifier names:
and break do else elseif
end for function goto if
in local not or repeat
return then until while
Identifier
In ELTN an identifier starts with an underscore or ASCII letter
and contains zero or more ASCII letters, ASCII numbers, or underscores,
and does not match the tokens nil
, true
, false
, or any reserved
keywords.
The following are valid identifiers:
foo
testing123
snake_case
Capital_Case
camelCase
ALL_CAPS
___ -- confusing and hard to read, but valid
The following are invalid identifiers:
123skidoo -- starts with a number
beta-test -- includes a '-' in the middle
+larry -- doesn't start with a letter or '_' character.
Number Literal
Numbers in ELTN resemble those in other languages: an integer part, a decimal
part, and an exponent. However, like Lua, both integers and decimal numbers
may use hexadecimal notation by prefixing the constant with 0x
or 0X
.
The following are valid integers:
1234
0
-74
10294928
0x3e8 -- the hexadecimal value 0x3E8 or 1000
037 -- produces 37, not the octal value.
The following are valid decimal (floating-point) numbers:
100.000 -- 100 to three decimal places
3e8 -- 3 * 10^8, approximately the speed of light in m/s
0x3e8p8 -- 0x3e8 * 2^8, or 256000
The following are invalid numbers:
100,000 -- the comma is invalid ELTN syntax
1_000 -- so is the underscore (this isn't Python or Eiffel)
23d7 -- lacks an "0x" prefix, if this was hexadecimal
If your application requires octal numbers, prettier decimal numbers, or anything else outside ELTN syntax, you can use a quoted string and do the conversion in your application itself.
Quoted String
Quoted strings begin and end with the same type of quote, either a single
quote ('
) or double quote ("
). Most characters between the
quotes stand for themselves, with a few exceptions:
- A string can include a quote mark of the same type that began the string
if it is immediately preceded by a backslash (
\
). - Any newlines between the quotes must be escaped with a backslash.
- A backslash also begins an escape sequence that produces whitespace or non-printable characters, among other uses.
- The sequence
**\z
** escapes not only a newline but all the whitespace preceding it until the next printable character, including another escape sequence.
Escape Sequences
Within a quoted string the following escape sequence represent special
characters or, in the case of \z
, removes characters from the final string.
Escape | Byte(s) | Meaning |
---|---|---|
\a | 0x07 | bell |
\b | 0x08 | backspace |
\f | 0x0c | form feed |
\n | 0x0a | newline |
\r | 0x0d | carriage return |
\t | 0x09 | horizontal tab |
\v | 0x0b | vertical tab |
\\ | \ |
backslash |
\" | " |
quotation mark / double quote |
\' | ' |
apostrophe / single quote |
\↩︎ | 0x0d | escaped newline |
\z | skip to the next non-whitespace character | |
\xXX | 0xXX | byte value in hexadecimal digits |
\DDD | 0DDD | byte value in octal digits |
\u{XXXX} | utf8( XXXX) |
UTF-8 bytes for code point 0xXXXX. |
Most of these escape sequences are familiar to users of C, C++, C#, and Java. Most or all should be familiar to users of Lua. The unfamiliar ones need some explaining.
Escaped Newline
The newline after the slash becomes part of the string. So, for example:
s = "string begins, \
string continues"
is equivalent to:
s = "string begins, \nstring continues"
The “z” Escape
All the whitespace disappears, along with the \z
. So, for example:
s = "string begins, \z
string continues"
is equivalent to:
s = "string begins, string continues"
To quote Ierusalimschy et al. verbatim from the
Lua 5.4 manual,
[…] it is particularly useful to break and indent a long
literal string into multiple lines without adding the newlines and spaces
into the string contents.
Octal Digits
A sequence of octal digits may have only one or two octal digits unless the following character(s) are also digits, to avoid ambiguity.
In the Lua interpreter, the string “\0” produces a string of length one containing a zero byte. Since C (and other languages) use a zero byte to terminate their strings, implementers of ELTN parsers should always provide not only they bytes of an interpreted string but its length.
Unicode Sequence
This sequence should encode a Unicode code point into a two or three byte UTF-8 sequence.
The enclosing brackets are mandatory, and in many ways a really good idea. The code can specify any number of hexadecimal digits, from one to six (or more).
Long String
Open and close quotes enclose quoted strings; long brackets enclose long
strings. A sequence like one of these – [[
, [=[
, [==[
, etc. –
is an opening long bracket. The number of equals signs between
the first and second square bracket define the level of the bracket.
After the opening long bracket, and an optional newline immediately after
the long bracket, the string includes every character in the text, without
further interpretation, until the first closing long bracket of
the same level. That is, if a long string begins with [=[
, an ELTN
parser will interpret the first ]=]
it sees as the end of the string.
This means the parser will bypass closing brackets of a different level,
whether ]]
or ]==]
.
The following is an example of a Long String:
long_string = [=[this is a "Long String" but not a very long string.
It continues like this, until the closing bracket, which is not
"]]" or "]==]" but this:]=]
Syntax and Semantics
Definition
This is a valid ELTN document:
{
markup = {
tableOfContents = { startLevel = 2, endLevel = 5 };
highlight = {
style = "monokailight";
tabWidth = 4;
};
goldmark = { renderer = { unsafe = true }};
},
taxonomies = { tag = "tags" }
}
So is this:
markup = {
tableOfContents = { startLevel = 2, endLevel = 5 };
highlight = {
style = "monokailight";
tabWidth = 4;
};
goldmark = { renderer = { unsafe = true }};
}
taxonomies = { tag = "tags" }
Unlike JSON, but like TOML, ELTN has a whole-document, top-level name-space. Think of it as a JavaScript Object, Python dictionary, or Ruby Hash containing a mapping from identifiers to nested Tables of information. An identifier may be defined at most once; multiple definitions of the same identifier are invalid.
Definition Lists have a few restrictions relative to Tables:
- The only keys permitted are identifiers, not strings or numbers.
- The only separators allowed are semicolons. On the other hand, separators are not required at all.
The corresponding Lua construct is the “global table” where all global variables reside.
Table
As in Lua, tables are the fundamental building block of ELTN. Everything that is not a scalar (String, Number, Boolean, or Nil) is essentially a Table. The key or index of a table is a String, Number, or occasionally Boolean used to refer to a contained value, which may be any of those, a Nil, or another Table.
Many languages have a distinction between an Array, List, or Sequence, which indexes its contents by Number (almost always integer values starting at 0 or 1) and a Dictionary, Hash, or Map, which indexes its contents using a String or sometimes another datatype. ELTN mixes the two. This is a valid Table in ELTN:
{ "one", "two", "three", [4] = "four", count = 4, ["creepy laugh"] = "ah ah ah"}
The elements not preceded by key assignments are implicitly assigned keys based on the order of their occurrence in the Table. The above is equivalent to the following:
{
[1] = "one",
[2] = "two",
[3] = "three",
[4] = "four",
count = 4,
["creepy laugh"] = "ah ah ah",
}
Implementers should use their native Hash, Map, Object, etc. structures to represent tables, converting numbers or booleans to strings if necessary. If a table looks sufficiently array-like, i.e. only integer keys starting at 1 and forming a continuous range, they might use a List or Array. Parsers are not obliged to assess the “sequence-like” or “mapping-like” tendencies of a parsed Table, however.
Any key that isn’t an identifier must be enclosed in square brackets: ‘[’…’]’. Exactly one String, Number, or Boolean may appear as a key, and the key must be unique within that table. An identifier is equivalent to its String, so this is invalid:
{
some_name = 3,
["some_name"] = 3 -- INVALID: same name, different syntax
}
Unlike Lua tables, an ELTN table allows only Strings, Numbers, and Booleans as keys. This is for the sanity of both parser writers and implementers in specific languages. Also, unlike Lua tables, ELTN tables have no identity, only a value determined by their contents. One can give tables an identity by referring to them with Definitions, but that lies in the application domain, beyond the scope of an ELTN specification.
String
To a certain extent Strings in ELTN resemble strings in nearly every programming language: they consist of a length and a sequence of characters. Quoted Strings and Long Strings resolve to the same String type; parsers should make no distinction in their API.
For preference ELTN treats strings as bytes which may correspond to ASCII, Latin-1, or UTF-8 character encodings, but it is up to the language and application to decide on the character encoding. A Java parser may, for example, use UTF-16 characters, while a parser ported to an old IBM mainframe may translate all characters to EBCDIC.
Number
Numbers, expressed as a Number Literal, notionally include all real numbers.
Due to the limitations of the floating point representation in most computers, they have only a finite number of digits of precision, represented as binary bits. If the language allows, parsers should represent numbers as standard integers, infinite-precision integers, or infinite precision decimal numbers if the Number Literal expresses an integer or decimal value. Otherwise double-precision floating point is sufficient.
Boolean
Booleans have two values, true
and false
. They have counterparts, sometimes
with slightly different names, in JavaScript, Python, Ruby, Scheme, Lisp, and
even C/C++ after 2011.
Nil
nil
corresponds to NULL
in C, null
in JavaScript or Java,
None
in Python, and nil
in Ruby. It’s the value provided when
there’s no value to give.
JavaScript has a distinction between an “undefined” value, i.e. a missing
Definition or Table key, and a null
value, which was defined as
something other than an Object or primitive value.
Other Data Types
Other data representation formats, notably TOML and YAML, support
other data types, notably dates, times, date-times, object references, and
more. Lua has a userdata
type, but since it has no consistent
representation other than a memory address, ELTN supports only those types
that can be represented consistently as text.
To represent other data types, ELTN offers Strings and Tables. This specification offers non-normative suggestions about how applications can interpret these as other common data types.
Dates and Times
Applications can represent dates and times using
ISO-8601 date and time strings.
For example, May 23, 2025, at 7:30:19 AM in the North American Central Time
Zone, translates to "2025-05-23T07:30:19-0500"
. (The -0500 part reflects
the offset from Greenwich Mean Time for Central Daylight Time.)
ISO-8601 also has a standard to capture intervals of time, but something informal like “3M” for three months or “30m” for 30 minutes should suffice for many purposes.
Many languages offer libraries that translate ISO-8601 to their own native date-time representations and back.
References
ELTN represents data as an acyclic directed graph of table keys toward their values. The specification encourages writers of ELTN emitters and serializers to check for possible cycles in the graph of objects being submitted to be serialized as ELTN. How, then, can a serializer or emitter handle two references to the same Table (Object, Array, List, Dictionary, etc.) in the graph of objects?
One possible method relies on Definitions. The top level may contain
a number of entries of the form __ref__<ID>
, where <ID>
is some randomly
or sequentially generated numbers and letters. Where the table for that
definition should appear, the application has instead "__ref__<ID>"
for
that ID. If the output is supposed to have only one table, it could be marked
“__top”.
References to external data could use URLs or URIs (Strings) using REST, GraphQL, or other modern Web principles and standards.
Structs and Data Objects
For less self-referential data structures, Tables are ideal. For example, let’s say a group of professional programmers keeps a distributed lending library of the books on their shelves. Each member uploads an ELTN document that lists of the books they’re willing to lend out that might look like this:
{
member = {
number = 13,
name = "Frank Mitchell",
contact = {
email = "frank.mitchell@example.com",
-- no other contact information
}
},
books = {
{
author = "Donald E. Knuth",
title = "Literate Programming",
publisher = "CSLI",
year = 1992
},
{
author = "Jon Bentley",
title = "More Programming Pearls",
year = 1990,
publisher = "Addison-Wesley",
},
--[[
... many more ...
]]
}
}
Developers and domain experts might have to develop a data schema, but anything
expressed in a standard struct
or data object should be expressible in ELTN.
ELTN Files
As described under Character Set, an ELTN file is a text file encoded as ASCII or a compatible character encoding.
File Extension
The suggested file extension for ELTN files is .eltn
.
Mime Type
The suggested mime type is “application/eltn”.
EBNF Grammar
The grammar’s notation is as follows:
lower_case
- A reference to a grammar rule, defined with the symbol
=
. SOME WORDS
- A description of a rule in plain English.
"
x"
- A literal character or sequence of characters.
"\
x"
- An escape sequence denoting a non-printable or confusing character.
A literal
"
is written as"\"
; a literal \ is written as"\\"
. See Escape Sequences, above. - x
,
y - A sequence containing both x and y.
- x
|
y - Either x or y.
[
x]
- Zero or one of x.
{
x}
- Zero or more of x.
(
x)
- A group of items that are treated as a unit.
For example "y", [ "a" | "b" ] , "z"
means
a sequence of zero or more “a"s or “b"s, starting with a “y” and ending
with a “z”: “yz” “yaz”, “ybz”, “yaaz”, “ybaz”, etc.
Parser Rules
document = ws , ( deflist | table ) , ws ;
deflist = { definition | ";" } ;
definition = ws, defname , ws , "=", ws , value ;
defname = name ;
value = "nil" | scalar | table ;
scalar = boolean | number | string ;
table = "{" , [ ws , entrylist ] , ws , "}" ;
entrylist = ws , entry , { ws , entrysep , ws , entry }, ws , [ entrysep ] ;
entry = ( ws , key , ws , "=" , ws , value, ws ) | ws, value ;
key = ( ws , "[" , ws , scalar , ws , "]" ) | ws, NAME ;
entrysep = "," | ";" ;
Lexical Tokens
Whitespace and Comments
ws = { whitespace | comment } ;
whitespace = " " | "\t" | newline ;
comment = ( "--" , NOT A NEWLINE , newline )
| ( "--" , long_string ) ;
newline = "\n" | "\r\n" ;
Strings
string = quoted_string | long_string ;
quoted_string = dquo , { qstr_char_dquo | escape_sequence } , dquo
| squo , { qstr_char_squo | escape_sequence } , squo ;
dquo = """" ; (* should be "\"" but for syntax highlighter *)
squo = "'" ;
qstr_char_dquo = NOT A dquo OR A NEWLINE;
qstr_char_squo = NOT A squo OR A NEWLINE;
escape_sequence = "\\" , dquo
| "\\" , squo
| "\\" , newline
| "\\" , "z", { whitespace }
| "\\" , ("a"|"b"|"f"|"n"|"r"|"t"|"v")
| "\\" , "x" , hexdigit , hexdigit
| "\\" , octdigit [ octdigit [ octdigit ] ]
| "\\" , "u{" , hexdigit , { hexdigit } , "}" ;
long_string = long_open_N , long_sequence_N , long_close_N ;
long_open_N = ( "[" , { "=" } , "[" ) FOR EXACTLY N "=" ;
long_sequence_N = NOT A long_close_N ;
long_close_N = ( "]" , { "=" } , "]" ) FOR EXACTLY N "=" ;
Names, Booleans, and Reserved Words
boolean = "true" | "false" ;
reserved_word = "and" | "break" | "do" | "else" | "elseif" | "end" | "for"
| "function" | "goto" | "if" | "in" | "local" | "not"
| "or" | "repeat" | "return" | "then" | "until" | "while" ;
name = (namestart { namepart }) AND NOT A boolean OR reserved_word ;
namestart = letter | "_" ;
namepart = letter | digit | "_" ;
Numbers
number = dec_number | hex_number ;
dec_number = dec_integer , [ "." , { digit } ] , [ dec_exponent ]
| sign , "." , digit , { digit } , [ dec_exponent ] ;
dec_integer = sign , digit , { digit } ;
dec_exponent = ("e" | "E") , [ "+" | "-" ] , digit , { digit } ;
hex_number = hex_integer , [ "." , { hexdigit } ] , [hex_exponent]
| sign , hex , "." , hexdigit , { hexdigit }, { hex_exponent } ;
hex = "0", ("X"|"x") ;
hex_integer = sign , hex , hexdigit, { hexdigit } ;
hex_exponent = ("p" | "P") , [ "+" | "-" ] , hexdigit , { hexdigit } ;
sign = [ "-" ] ;
Basic Definitions
letter = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I"
| "J" | "K" | "L" | "M" |"N" | "O" | "P" | "Q" | "R"
| "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
| "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i"
| "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r"
| "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
hexdigit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
| "A" | "B" | "C" | "D" | "E" | "F"
| "a" | "b" | "c" | "d" | "e" | "f" ;
octdigit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7";
Appendix: Self-Identification and Encoding Information
ELTN parsers should leave interpreting any bytes in strings and comments to the calling program. Some, however, may want to label their documents with a version of the ELTN specification and possibly a character encoding, or may encode their documents in unusual formats like UCS-2 (UTF-16), UCS-4 (UTF-32), or EBCDIC. A parser may therefore have to transcode their input streams into something it can recognize.
In an effort to establish some standard for encoding ELTN documents, the author submits this additional syntax. A document is not required to provide encoding information, and parsers may simply treat this proposed convention like any other ELTN comment.
The very first bytes in an ELTN document stream may be these:
identification = [byteordermark] ,
"--" ,
space ,
"ELTN" , space , "=" space, dquo, "1.0", dquo
space ,
[ "charset" , space , "=" , space , dquo, charset, dquo ],
space ,
newline ;
byteordermark = "\u{FEFF}" ;
space = { " " | "\t" } ;
charset = cschar { cschar } ;
cschar = letter | number | "-" | "_" | "." | ":" | "/" ;
Unicode uses the Byte Order Mark to determine the byte encoding of a document. The order in which the two bytes occur tells a parser whether a document is UTF-8, UTF-16, or UTF-32 and in what order the rest of the document’s bytes| will appear. The following was adapted from an appendix to the XML Specification.
First Four Bytes | Encoding | Comments |
---|---|---|
00 00 FE FF |
UTF-32BE | four bytes, highest first (big endian) |
FF FE 00 00 |
UTF-32LE | four bytes, lowest first (little endian) |
00 00 FF FE |
??? | four bytes, unusual ordering |
FE FF 00 00 |
??? | four bytes, unusual ordering |
FE FF 00 00 |
UTF-16BE | two bytes, higher first (big endian) |
FF FE ?? ?? |
UTF-16LE | two bytes, lower first (little endian) |
EF BB BF ?? |
UTF-8 | the Byte Order Mark in UTF-8 encoding. |
00 00 00 2D |
UTF-32BE | no BOM, but a - in UTF-32. |
2D 00 00 00 |
UTF-32LE | no BOM, but a - in UTF-32. |
00 2D 00 2D |
UTF-16BE | no BOM, but a -- in UTF-16. |
2D 00 2D 00 |
UTF-16LE | no BOM, but a -- in UTF-16. |
2D 2D 20 45 |
ASCII? | no BOM, but a -- E in ASCII or UTF-8. |
60 60 40 C5 |
EBCDIC? | -- E in EBCDIC; parser may need to decode. |
(Bytes noted as ??
are irrelevant as long as they are not 00
.)
Following that is a standard ELTN / Lua comment that identifies the
document as ELTN version 1.0. What follows (perhaps) is a single directive
“charset = X”, where X is a known encoding name used by iconv
,
Java"s character encoding system, and other common systems.
-- ELTN = "1.0" charset = "UTF-8"
This information may benefit a parser’s caller, so parsers may want to relay comment contents to that level.