Teufel Again

Posted: 2024-11-17
Word Count: 2308
Tags: c-programming java python vaporware

Table of Contents

A video excerpt of Robert “Uncle Bob” Martin prompted some thoughts about the programming language I will probably never write, Teufel.

Encapsulation

In the video, Martin speaks of how the C programming language had strict encapsulation: a .h header file that provided the interface and a .c code file that provided the implementation. He then laments that nearly all object-oriented languages jumble interface and implementation together in one (or, worse, two) files, with compiler directives to make certain members “public”, “private”, “protected”, or other. (In Java that’s “package protected”.)

In the old days Martin was a force for greater professionalism in software development, but these days he comes off as a bit of a curmudgeon.

Nevertheless, I think he has a point. I’ve found myself writing Java libraries defined by publicly accessible interfaces and private implementations. As soon as I master the Java module syntax I’ll make the latter inaccessible from external code. Maybe it’s simply old C programming habits. Still, if you want truly modular code, the technique of hiding the gory details under a clean abstraction seems like the safest path.

Interface Inheritance

Java isn’t the only language with interfaces. Python has a sort of “duck typing” interface, and as I understand it Go was built with interfaces that a later programmer can retrofit onto existing code.

Python interfaces are frustrating, however, since there’s no necessary or explicit connection between an implementation class and its “protocol”. Furthermore Python leaves all protocol conformance to external tools. Then again, type checking relies on external tools, so something like

foo: int = "this is not an int"

will load and run just fine, even if mypy and other tools squawk.1

Honestly I wish there was an interpreter mode that would insert assertions around typed code, so that any time you assigned a string to a variable that previously held an int the runtime would throw an exception. Maybe that’s impossible in the Python bytecode. The best I’ve found is the @runtime_checkable decorator for Protocols that allows one to test for protocol conformance using isinstance().

Data Classes

Another convenience of Python are “dataclasses”, which uses a decorator to declare a class’s (public) members. There’s even an option to declare them immutable.

A common pattern in multithreaded applications called Communicating Sequential Processes (CSP) divides a program into single-threaded islands of mutable data that send “messages” containing (notionally) immutable data to each other. The entire Erlang language relies on this paradigm, and parts of the Go and Clojure languages implement CSP. Functional or what I call “function-oriented”2 languages do this kind of thing extremely well, because of their emphasis on functions without side effects and transparent data structures.

CPU manufacturers can only optimize their silicon so far; their solutions involve putting more cores on each chip. GPUs are faster, but they rely on massively parallel operations. All this points to a future where programs will become increasingly parallel, and CSP seems like an ideal strategy to cut down on synchronization locks and automate the writing of high-performance programs. In traditonal multithreading synchronization happens on each block of mutable data as an arbitrary number of threads contend to change it. In CSP synchronization happens only on either end of the pipe or queue between two processes.

As Python slowly retires its Global Interpreter Lock (GIL) maybe it will handle CSP-style applications efficiently. Maybe not.

Enter Teufel?

As I mentioned before, this language I may never write would gore some sacred cows of object orientation for the sake of enhancing encapsulation, ensuring type safety, and embracing a multi-threaded world.

The Teufel Type System

The Teufel type system would include these not quite disjoint types.

Any

Any is the supertype of all runtime entities below. If a type is declared Any, programs would need to narrow the type before they could use the value.

None

None is the type with no values. It’s equivalent to void in C or None in Python. If the syntax requires a type but a Routine returns nothing, it’s implicitly or explicitly None.

The implementation may use a placeholder value equivalent to an empty Tuple or empty List in some cases. It will not have a single explicit null, nil, None, or undefined value; that will be left to Datatypes.

Datatype

Datatypes represent immutable data structures with public members. Syntactically a Datatype can only consist of references to Objects or other datatypes. Some may be “polymorphic” in the sense that a C union is polymorphic; all alternate forms are defined in the same spot.

Important subtypes of Datatype includes built-in “atomic types”, e.g. Numbers, Strings4, Booleans, Bit Sets5, and Timestamps6. I might further subdivide those basic types into implementation-based subtypes, e.g. Fixed_Integer vs. Big_Integer7 vs Float, or even restrict some variables to certain ranges of certain types as in Pascal. At runtime, though, operations on any atomic types produces a value general enough to accomodate the result.

Other built-in Datatypes might include:

Except for Tuple, all of these (hopefully) can be defined within the language itself.

Because they’re immutable, threads can copy or pass datatypes, both their values and their metadata, freely among themselves.

Object

Object types have an identity and contain mutable data. An Object’s full type is defined by exactly one Class and one or more Protocols. Built-in Object types include:

Because they have identity and mutable state, all object instances stay local to only one thread to reduce synchronization. Migrating between threads requires them to serialize their state, reinstantiate themselves on a new thread, and which most applications would not need or want.

Routine

Routine types have a signature of input types and output types. (Plural). Routines may be further subdivided into:

  1. Pure Functions, which manipulate only immutable types and must have a return value.
  2. Predicates, which return only a boolean value.
  3. Functions which return at least one value.
  4. Procedures, which return no values.

Each Routine may also have:

Why preconditions? Multiple routines might have the same signature, yet take very different data values. Design by Contract, while somewhat redundant with software unit testing, attaches assertions to the code itself rather than to external code. The code to test assertions can be disabled at compile-time or runtime.

All threads share the code and metadata of Routines, or at least as much as they need to.

Protocol

A Protocol consists of a name, zero or more inherited Protocols, zero or more Constants, and a table of Messages. Each Message, in turn, consists of:

Each Protocol also includes zero or more invariants, which are preconditions and postconditions on an implementation’s entire observable state.

Semantically, any Routine used to implement a Message must conform to the preconditions and postconditions. They may widen the preconditions and narrow the postconditions, but not the other way around.

All threads share the metadata of Protocols, although keeping multiple distributed processes in sync can be a challenge.

Class

A Class, as stated previously, consists of one or more Protocols, zero or more generic type parameters (bound or filled), a set of Routines to create Class instances, and (effectively) a table that maps a Message to a “Feature”8, which is either a Routine or an instance variable

Note that classes do not inherit from each other. At all. This forces programmers to choose composition over inheritance … a bit draconian, but it makes implementation much easier.9

All Classes have at least one Protocol. If they do not explicitly inherit from one, the syntax will allow the source code to designate certain features “public”, creating a Protocol with the same name as the Class. Otherwise, all class features are “private”, accessed directly or indirectly through Protocol messages.10

All threads keep the exact implementation of Classes private, although in practice threads will share Class metadata and code.

Back to Reality?

Maybe there are existing languages like OCaml with enough features that I don’t have to write my own. Knowing me, though, I probably won’t be satisfied until I write my own weird Eiffel / Lua / SML / Objective-C / etc. hybrid.

Postscript: Naming Conventions

Today I just saw this video in the same series as the Uncle Bob one. I couldn’t agree more. If I ever write Teufel, the coding convention will be something like this11:

protocol Some_Example                   -- Type name (Title Case)

    SOME_CONSTANT: Integer = 100        -- CONSTANT (all caps)

    some_property: Integer              -- property message (snake case)

    set_some_property(value: Integer)   -- property set message (snake case)
        alias
            "some_property="            -- Python/Ruby property setter?
        require
            maximum: value <= SOME_CONSTANT
            minimum: value >= 0
        ensure
            new_value: some_property = value

end

I never liked Java’s getX()/setX() convention. I guess it flagged methods as a property pair, but properly speaking a “property” is a message/function with no arguments (apart from the object itself) and (at least?) one return value. That’s what Python, Ruby, Eiffel, and other languages use. How much harder would it be to search for a second routine to see if the property is directly mutable? (As opposed to mutable through other methods.)

Anyway, I’ve only watched a few videos in the series, and I don’t remember seeing a disappointing one. (Maybe because they’re too short?) Even if you disagree, you have to think about why you disagree.


  1. I haven’t gotten far into TypeScript, but I wonder how well it works given that the DOM and other APIs are themselves untyped. ↩︎

  2. As a parallel to “object-oriented”. At a certain level of abstraction and sophistication, functions and objects begin to resemble each other. Is it an object that accepts “messages” and manipulates instance variables, or is it a closure on a function whose first argument is a “message” that manipulates captured variables? ↩︎

  3. I’m not enamored with the Java “type cast” expression: it throws an exception rather than handling the common “false” case locally. ↩︎

  4. Immutable Unicode strings, with the basic operations of concatenation, indexing, and Python-like slicing. Notionally, as in Python, each multi-character String is made up of one-character Strings. ↩︎

  5. A potentially infinitely sized bit mask. ↩︎

  6. A distinct UNIX-like timestamp type, because a) every language acquires a “date”, “time”, or “datetime” type sooner or later, but b) dates and calendars are hard. ↩︎

  7. Infinite precision number, built in from the start. ↩︎

  8. A term borrowed from Eiffel, Teufel’s sort of namesake. ↩︎

  9. One reason C++ puts member variables in its header files is that anything that inherits from a C++ class has to reserve room for the superclass variables. (Objective-C had the same problem.) With no implementation inheritance, that problem goes away. Teufel datatypes would compile into C as structs, protocols and classes into runtime data structures with associated code, and object instances into mutable structs with a pointer to the class dispatch table. ↩︎

  10. Java classes, for example, also have protected and package private members to define access to class members from subclasses and from members of the same package. I’m disallowing subclasses, so the first is not needed. As for the second, I’d prefer to restrict visibility through an external “module” system analogous to Java 9 modules over something like Java packages or Eiffel’s potentially chaotic visibility through naming specific classes in the exporting code itself. ↩︎

  11. Actual syntax may start off a bit noisier, e.g. to differentiate constants, protocol Messages / class implementations, and instance variables, all of which have significant implementation differences. ↩︎