Unwanted Software Thoughts Syndrome | Frank Mitchell's Blog

Working on ELTN opened a floodgate of backburnered projects I was going to work on in my copious free time. A few have begun to haunt me again. In order of likelihood, they are:

Lua Version Manager a.k.a. Lua-Env (name not final)

Even though Lua was intended as an embedded language, its standalone interpreter makes testing modules much easier. However, testing against multiple versions of Lua isn’t exactly straightforward. luarocks, a module manager similar to Ruby Gems, hard-codes the path to Lua not only in config files but the luarocks script itself. Worse, Lua’s module loader assumes Lua is installed in /usr/local on Unix systems. Users can override the search path in the standlone interpreter by setting an environment variable, and within the interpreter one can change package.path and package.cpath. It might be simpler, though, to change those defaults to the directory in which Lua is installed – found once in luaconfig.h – and then compile the interpreter.

Following in the footsteps of rvm and especially rbenv, this Lua Environment Version Manager thing would let developers install multiple versions of Lua in their home directories, set a default, type lua at the command line, and it would Just Work. Likewise installed Lua Rocks would Just Work.

Right now I use a minimal, hand-configured version of “luaenv”. Currently a “lua” shell script in my home directory invokes the lua interpreter for a configured Lua version. A fully functional version would download Lua versions, tweak the default load path, compile Lua, and install it in a version-specific directory. Subcommands would set the default version, list all installed versions, uninstall a version, and so on. luaenv need not support every facility of rbenv though. A user default version overriden by an environment variable would suffice. (Directory-specific configuration would be nice to have, though.) luaenv would also need only a few “shims”, mainly for lua, luac, luarocks, and luarocks-admin, although conceivably a Lua Rock could include an executable script.

Since I’d like to test my ELTN parser on every version of Lua since 5.1 (5.0?) I’ll probably finish this.

JSON Pull Parser

Typical parsers read all their input at once. The result is a data structure (e.g. JSON and YAML parsers), output derived from a parse tree (e.g. compilers and Markdown processors), and/or multiple calls to a caller-supplied listener (e.g. SAX parsers for XML).

A pull parser, on the other hand, reads just enough input to derive a token or production, then returns to the caller. The caller then processes the event, calls the pull parser again to get the next one, and so on until it processes all the input. A pull parser’s major advantage is memory usage: instead of building a parse tree or other data structure, the parser keeps just enough state to check the syntax of the next token. Java has a semi-official XML Pull Parser.

I think I wrote one of these in Java, although apparently I didn’t keep it. IIRC, the interface was something like this:

interface JSONPullParser {

    enum EventType {
        ERROR,               // parsing error
        DOCUMENT_START,      // start of input
        OBJECT_START,        // start of JSON Object ('{')
        OBJECT_END,          // end of JSON Object ('}')
        OBJECT_KEY,          // key in a JSON Object ('"...":')
        ARRAY_START,         // start of a JSON Array ('[')
        ARRAY_END,           // end of a JSON Array (']')
        VALUE_BOOLEAN_TRUE,  // JSON Boolean 'true'
        VALUE_BOOLEAN_FALSE, // JSON Boolean 'false'
        VALUE_NULL,          // JSON Null
        VALUE_NUMBER,        // JSON Number
        VALUE_STRING,        // JSON String
        DOCUMENT_END         // end of input
    }

    /** Reset and begin parsing from `reader` */
    void setInput(Reader reader);

    /** Reset and begin parsing UTF-8 from `stream` */
    void setInput(InputStream stream);

    /** Advances to next event */
    void nextEvent();

    /** Last event parsed, or DOCUMENT_START if just reset. */
    EventType currentEvent();

    /** Error message on an ERROR event. */
    String currentErrorMessage();

    /** Last parsed Object Key **/
    String currentKey();

    /** Numeric value of a JSON Number */
    double currentNumberValue();

    /** Text of a JSON String, Number, Boolean, or Null. */
    String currentStringValue();
}

The currentKey probably should have remained set for the following value, but I don’t think it did.

Anyway, I may try to reproduce this in Java, Lua, Ruby, and/or C.

Statiki

Several years ago, I began using Instiki to record RPG-related ideas, bits of constructed languages, and notional computer projects like these. The project went through multiple revisions, including a change of database and a port to Rails.

Today, I can’t get the thing to run on anything past Ruby 2.3. Since 2.3 was recently End-of-Lifed, this isn’t good. Also, over the years the maintainer put in a lot of functionality I don’t need, like copious MathML support, an SVG editor, and LaTeX conversion.

“Statiki” mimics a wiki, but with statically generated HTML pages I can browse directly. It would include only stuff I actually use:

Create a new page.
Edit a new or existing page as modified Markdown to be rendered into HTML.
- Refer to other pages by title rather than URL (like a wiki).
- Use Markdown extensions like tables and footnotes.
- In long pages with headers, put a hyperlinked table of contents at the top.
- (Nice to have) Combine multiple small pages into a single hyperlinked page.
- Render HTML into a standard template with a useful <TITLE> and CSS stylesheets.
- Index the pages by name and by category (defined in page metadata).
Revert to previous versions of the same page if necessary.

Note that I don’t need multiple user support, logins, spam protection, etc. The contents are either personal notes or published content with one author, me.

The third part I could do in Subversion without an SQL(ite) database. The first part could be as simple as creating a Markdown page, much like I do using hugo. The second part, therefore, reduces into the following steps:

Traverse one or more directories, building a map of page titles to page content.
Render each page, replacing a WikiWord or name in [[…]] with a relative URL to the destination page.
Write index pages for all pages in a ‘statiki’ and for each category found.

hugo does most of this except for the wiki-like URL substitutions and tables of contents. I’ll probably copy its metadata header format for titles and categories, and maybe extend it for renamed pages. The actual program, however, will be written either in Ruby or Lua, using kramdown or lua-discount, respectively.

Since I have a personal need for this, and all the tools to write it, I’ll probably get to this sooner rather than later.

TPP

TPP stands for either Text Pre-Parser or Template Pre-Parser. The original design derived from the C pre-processor and a simplified Eiffel version. Having encountered StringTemplate, however, I thought of doing a Lua interpretation of that instead. (Maybe with a more readable syntax.)

Right now I don’t have a compelling case for writing this. StringTemplate already exists, and there are better tools for hacking text or HTML into PDF, EPUB, or other formats.

Concurrent Sequential Processes in Lua a.k.a. Lua-CSP

Programming in Lua, 4th Edition ended with an example of multiple Lua interpreters, each in its own OS thread, sending text messages to each other. I can refine that into a module for “Channels”, “Messages”, and “Processes” (POSIX or C11 threads, since Lua already has cooperative single-process Threads). Combined with a non-blocking IO module one could implement any kind of server with worker threads or chained stages: HTTP, message queue, pub/sub, etc. All you’d need are the underlying modules and some Lua scripts. (Maybe a configuration file to tie it all together.)¹

I really don’t have a compelling need for this either, apart from experience with multithreading in C, CSP, and Lua.

Teufel

Well over a decade ago, I read Bertrand Meyer’s Object Oriented Software Construction, and thought it the bee’s knees. Since then, I’ve had experience with functional languages, dynamically-typed languages, and distributed computing. Meyer’s approach, and the Eiffel language, is one way to construct systems. As a whole, though, it’s a slow, brittle, and sometimes over-complicated way. Eiffel’s design also reflects not only Meyer’s personal preferences (unsurprisingly) but an earlier era where memory and processor cycles were much more scarce. Scripting languages like Lua, Ruby, and (ugh) JavaScript provide a complementary construction technique: less efficient but more flexible glue between Eiffel style “reliable components”. The nature of the Web and distributed computing also argues for a more function-oriented approach. A lot of Web programming is transforming GET and POST requests into HTML (or, lately, JSON) responses, with a few side effects.

Also, all the free Eiffel compilers are broken, and Meyer’s company offers only crippled “evaluation” versions.

So in a fit of hubris I decided to write my own language, Teufel, with the following features:

Static, explicit typing … mostly.
Pre- and post-conditions like Eiffel’s “Design by Contract”, with some extensions.
A strict line between “data” types with value semantics and “object” types with reference semantics, mutable state, and identity.
Standalone functions as first-order types.
Interface inheritance without implementation inheritance. To reuse implementations, wrap them in functions and use them in class routines.
An “interface” sub-language combining the aforementioned conditions with ideas from IDL and functional programming. Notably, SML-style “datatypes” would be part of the interface.
Conventions to make writing implementation classes easier. For example, if an interface specifies foo as an immutable list of type X, and an instance variable foo is a mutable collection of type X, then that satisfies the interface requirement, as collections automatically convert into lists of their contents.
Partial compilation, which the free Eiffel compilers I’ve seen couldn’t do. A Teufel module or package consists of a JSON file detailing the package’s public types, interfaces, and functions, plus C header files, compiled libraries, Java JAR files, scripts, or whatever else a target language requires to build executable code.
Instead of standard Make, Teufel primarily uses an ACT file² which details:
- whether the final product is a standalone program or a package
- if a standalone program, what its “main” function is, or if a server what remote objects it exports and with what protocol(s).
- the target language(s) of generated code (default: C).
- options for mapping Teufel types to target language constructs.
- other compile-time options, e.g. library paths, external libraries, and compiler flags for C.

That’s the grand plan, anyway. However, it’s probably better I do this in baby steps.

M-Strings

A hand-written library for memory managed strings in C. Yes, C++ has ref-counted strings, but a) I dislike C++ and b) I’d like some hands-on experience with automatic memory management. One of the major pain points in working with straight C is memory management in general and strings in particular. I might as well start with something useful.

Teufel Interface Definition Language (TIDL)

An IDL that is essentially the datatype and interface parts of the full Teufel language, with an accompanying code generator. It will generate stub and skeleton code in C much like CORBA. An extension will generate skeleton code for modules in Ruby, Lua, Python, and other scripting languages, not unlike SWIG.

M-Lib/T-Lib

A library of other useful data structures, probably developed as a side effect of M-Strings and using similar memory managed techniques.

Teufel Compiler, Mark 1

Combining TIDL with a language for implementing functions and objects, we now have a Turing-complete language for standalone programs. The first version will probably be an executable JAR file, both for portability and to take advantage of ANTLR. Finding a workable compromise between high-level scripting operations and low-level bit-crunching may require numerous iterations, and being able to change the parser easily will be a great help. Later I’ll worry about writing a Teufel compiler in Teufel.

Teufel Compiler, Mark 2

Once I’ve got a workable language, the next issue is the mapping to C. Eiffel’s official mechanism for incorporating C libraries is kind of a mess, and I’m not sure the SmartEiffel/LibreEiffel implementation is much better. I prefer the Lua model: a reentrant C API into the Teufel “runtime”. Rather than Lua’s stack discipline, though, I’d rather generate “object oriented” C methods. E.g. something like this:

r = obj.some_routine(a, b, c)

turns into this:

int errcode = Interface_Name_some_routine(obj, a, b, c, &r);

obj is an opaque reference value, which may or may not encode a pointer. Similar methods would wrap data types, primarily so programmers can’t accidentally or intentionally change their value.

Teufel Compiler, Mark 3+

Then it’s time to deliver on other features, such as autogenerated server code, multithreaded server configuration files, output in Java or scripting languages (perhaps for proxies), output as native modules in Java or scripting languages, and control of all options through a single ACT file. As syntax and semantics stabilize, I’ll retire the ANTLR-based compiler in favor of one written entirely in Teufel.

Schedule

Expect the first version of Teufel … well, don’t expect any version. There’s the software you write, and then there’s the software you talk about.

Fellow survivors of Bang Networks in San Francisco may find the idea disturbingly familiar. ↩︎
After Eiffel’s ACE (Assembling Classes in Eiffel) file. ↩︎