Exported Objects (Work In Progress)

Frank Mitchell

Posted: 2023-03-21
Last Modified: 2023-07-16
Word Count: 1989
Tags: c-programming programming vaporware

Table of Contents

Exported Objects1 (ExO) allows programmers to define “objects” in C and use them in any scripting language or environment with an ExO binding. Initially that will include Lua, Python, Ruby, and a standalone runtime similar to Objective-C usable from plain C.

Concepts

In this document, we use the following definitions:

Class
Informally, all objects with the same structure and functions. Formally:
  1. An E_Class structure that records the constructor, destructor, and other functions that instances respond to.
  2. The set of Objects with the same E_Class.
Domain
A region of memory containing interconnected objects (q.v.) Exported Objects divides a system into multiple domains, each lying in another process, thread-local (or global) memory, or the memory used by an interpreter or compiled runtime.
export
To provide an object(q.v.) to another Domain (q.v.), either explcitly via E_Domain_export(name, obj) or implicitly by returning it as a result from a function.
object
any structure with private state and functions to manipulate that state.
Object (capital-O)
an object that is registered as an instance of a Class.

Domains

All Exported Objects live in a “domain”. Environments that define domain boundaries include:

Objects communicate across boundaries through the ExO runtime, through a lowest-common denominator of (mostly) immutable value types: arrays, booleans, nulls, numbers, strings, structs, and symbols, as well as “userdata” (opaque values exposed from other domains) and Classes, Functions, and Objects made available from other domains.

Memory and Reference Counting

All objects will use reference counting exclusively. The following functions comprise the new reference counting API2:

void* E_Any_retain(void* *obj);
void* E_Any_release(void* *obj);
void* E_Any_set(void* *obj, void* newobj);

Objects need only register a “free” procedure with their Class that looks something like this:

void My_Object_free(My_Object* obj) {
    E_Any_set(obj->some_ref, NULL);
    E_Any_set(obj->another_ref, NULL);
}

This will release any objects held by My_Object. Note that circular references pose a particular problem for exclusively reference-counted schemes. One technique is for child nodes in a tree not to add a reference to their parent, for example.

If an object has some resource like a file handle that it should clean up when it’s deleted, use this call:

E_Any_set_cleaner(self, self->rc, &cleaner);

Note that self will have already been deallocated when cleaner is called, so the function must use the resource to be cleaned as an argument.

Message Dispatch

Notionally Objects and Functions in different domains don’t call each others' functions directly. They send messages containing a domain destination (typically a number or URL), a message name or symbol, and any necessary arguments including the OIDs of a receiver Object. If the message is understood, the receiver executes its function and passes the results back as a Tuple

void E_Message_new(C_Symbol msg);
void E_Message_new_with_name(const char* msgname);
void E_Message_set_receiver(E_ID* obj);
void E_Message_set_argument(int index, E_Any* arg);

If the message is an Object feature, the receiver is the domain-specific E_ID of the receiver of the message. If the message is a Class constructor, the receiver is the Symbol for the class name. If the message is a Function call, the receiver is ignored. The remaining arguments are arguments to the feature, constructor, or function.

void E_Message_send(E_Domain* d, E_Message* m, E_Reply* *rptr);

rptr points to a E_Reply* variable. If set to NULL E_Message_send will return immediately and ignore any reply. Otherwise, the reply will contain one of the following:

Meta-Object Protocol

The runtime of each Domain and the Exported Objects system contains a lot of information about each object. It uses this information to create, manage, and delete these objects, as described below.

Creating Objects

Using object-based C, one would only need to write:

E_Char_Buffer_new(&buf)

Unfortunately ExO needs to do a few additional steps:

  1. Determine the E_Class of an object.
  2. From the E_Class, determine its “free” procedure.
  3. Set an instance’s initial reference count to 1.

Fortunately they’re all accomplished with the following call:

E_Class_create(&buf, "Char_Buffer", "new");

E_Class_new takes variable arguments which it will interpret according to the signature established for the New method.

Defining Interfaces

Absent some handy tool defining an interface will require something like this:

void Char_Sequence_class_init() {
    E_Type_Code T_Char_Sequence = E_Type_code("interface", "Char_Sequence");
    E_Type_Code T_int = E_Type_code("int");
    E_Type_Code T_char = E_Type_Code("char");
    E_Type_Code T_bool = E_Type_Code("boolean");
    E_Type_Code T_char_array = E_Type_Code("array", "char");

    E_Interface_Definition Char_Sequence_Defn = {
        "Char_Sequence",                    /* =: name */
        {},                                 /* =: extends */
        {                                   /* =: features */
            {"char_at",                         /* =: feature name */
                { T_int  },                     /* =: feature arguments */
                { T_char }                      /* =: feature results */
            },
            {"length",      {},             {T_int}},
            {"slice",       {T_int, T_int}, {T_Char_Sequence, T_bool}},
            {"slice_from",  {T_int},        {T_Char_Sequence, T_bool}},
            {"slice_to",    {T_int},        {T_Char_Sequence, T_bool}},
            {"to_chars",    {}              {T_char_array}}
        }
    };

    E_Interface_define(&Char_Sequence_Defn);
}

define_interface() will load the struct contents into an internal data structure, so implementers can create them on the stack.

Implementing Classes

Absent some handy tool implementing a class will require something like this:

E_Class* c = E_Class_define("Char_Buffer");

E_Class_implements(c, "Char_Sequence");

E_Class_define_size(c, sizeof(struct _Char_Buffer));

E_Class_define_init(c, &_skel_Char_Buffer_new);
E_Class_define_free(c, &_skel_Char_Buffer_del);

E_Class_define_constructor(c, "new", 
                    &_skel_Char_Buffer_new,
                    0);
E_Class_define_constructor(c, "new_copy", 
                    &_skel_Char_Buffer_new_copy,
                    1,
                    E_Type_code("interface", "Char_Sequence"));
E_Class_define_constructor(c, "new_with_chars",
                    &_skel_Char_Buffer_new_copy,
                    1,
                    E_Type_code("array", "char"));

E_Class_define_feature(c, "set_char_at", &skel_Char_Buffer_set_char_at,
                    2,
                    E_Type_code_in("int"),
                    E_Type_code_in("char"));

E_Class_define_feature(c, "to_string", &skel_Char_Buffer_to_string,
                    1
                    E_Type_code_out("interface", "String"));

E_Class_implement_feature(c, "char_at", &_skel_Char_Buffer_char_at);

/* 
 * or if code generation or libffi is supported ... 
 *
 *     E_Type_Map map_Char_Buffer_char_at = {
 *          // C arguments
 *          {
 *              // ExO index, ExO type, C type name, C size
 *              {
 *                  0,
 *                  "Char_Buffer",
 *                  "E_Char_Buffer*",
 *                  sizeof(void *)
 *              },
 *              { 1, "char", "wchar_t", sizeof(wchar_t) }    
 *          },
 *          // C return type
 *          { 2, "int",  "int", sizeof(int) }
 *     };
 *
 *     E_Class_define_routine_ffi(c, "char_at", Char_Buffer_char_at, 
 *                                      map_Char_Buffer_char_at);
 */

/* etc. etc. */

Implementing Functions

Functions, including interface features and class constructors, may take zero or more arguments and return zero or more results.

r1, r2, ... = f(arg1, arg2, ...)

As C does not allow this, function implementations need a “skeleton” wrapper with the following signature:

typedef void (*E_Function)(void *ud, E_Frame* c);

FFI or code generation would remove the need to write these functions explicitly.

Cross-Domain Types

Data Types

These are the most important data types in ExO.

ExO Type C type Description
Any E_Any* the parent type of all possible types
Binary E_Binary* an arbitrary length collection of bits
Boolean bool true or false
Function see below a capsule of code with arguments and results
List[T] E_List* an ordered, immutable sequence of T
Null NULL a value with no behavior
Number E_Number* a continuous, unit-less, one-dimensional quantity
Object see below an entity with behavior and state
Record E_Record* an immutable set of fields referenced by Symbol
String E_String* an immutable sequence of Unicode code points.3
Symbol C_Symbol a unique value sometimes linked to a string
Tuple E_Tuple* an immutable set of fields referenced by index
Userdata E_Userdata* a non-Object reference from this domain

ExO types have analogues in other languages:

ExO Type JSON/JavaScript Lua Python Ruby
Any (“value”) Object
Binary (number) integer BigInt
Boolean Boolean boolean boolean Boolean
Function Function function function Proc
List Array table list Array
Null Null nil NONE nil
Number Number number number Number
Object Object table instance Object
Record Object table Struct
String String string string String
Symbol Symbol (string) dictionary Hash
Userdata userdata (???)

Numeric Data Subtypes

Types come from <stdint.h>, <math.h>, and <wstring.h>. Not all types will be available on all platforms.

Type C type(s) Description
bigint E_Big_Integer an arbitrary length integer
bit uint8_t : 1 0 or 1
char wchar_t, utf32_t a Unicode code point
int int signed integer of at least 16 bits4
int16 int64_t 16-bit signed integer
int32 int32_t 32-bit signed integer
int64 int64_t 64-bit signed integer
octet uint8_t, octet_t an unsigned 8-bit value
real E_Real a non-integral number5
uint16 uint64_t 16-bit signed integer
uint32 uint32_t 32-bit signed integer
uint64 uint64_t 64-bit signed integer
R Type C type(s) Description
X Class E_Class a description of an Object’s features
Feature E_Feature the prototype of messages an object responds to and the expected return type
X Function
X - Exported E_Function an exported function in the current domain
? - Local various an un-exported function in the current domain
X - Remote E_Proxy a function in another domain
X Identifier E_ID a specifier for an object or function within a domain.
Interface E_Interface a named collection of features
Message E_Message a Symbol and list of arguments
Message Reply E_Reply a list of results from a Message
X Object
X - Exported various an exported object in the current domain
? - Local various an un-exported object in the current domain
X - Remote E_Proxy an object in another domain
Type E_Type the name of a Data Type, Class, or Interface
Type Parameter E_Type_Param a type variable bound to a concrete type
Type Variable E_Type_Var a variable in lieu of a concrete type6

R indicates this type can be the receiever of a message.

String

The initial proof of concept will be E_String, an encapsulated String usable with object-based C functions and Exported Object APIs alike.

The E-String API will resemble the M-String API with the following changes:

  1. Any function that returns an M-String will instead return void and take a pointer to an E-String pointer as a first argument.
    The pointer written to that argument will be automatically retained.

  2. There will be no equivalents to an M_Pool, M_Root_Set, or M_Scope.

  3. M_Char_Buffer and M_String_Array will be replaced by E_Char_Buffer and E_String_Array, respectively.

So, for example, these are the constructors:

void E_String_new_ascii(E_String* *sptr, size_t sz, const char* buf);

void E_String_new_utf8(E_String* *sptr, size_t sz, const utf8_t* buf);

void E_String_new_utf16(E_String* *sptr, size_t sz, const utf16_t* buf);

void E_String_new_utf32(E_String* *sptr, size_t sz, const wchar_t* buf);

void E_String_new_encoded(E_String* *sptr,
                              C_Symbol encoding,
                              size_t sz,
                              const octet_t* buf);

/* Create from a null-terminated string */
void E_String_new_from_cstring(E_String* *sptr, const char* cstr);

And these are the slice and join functions:

void E_String_join(E_String* *rptr, E_String* head, E_String* tail);
void E_String_join_n(E_String* *rptr, size_t n, ...);
void E_String_slice(E_String* *rptr, E_String* s, int first, int last);
void E_String_slice_from(E_String* *rptr, E_String* s, int first);
void E_String_slice_to(E_String* *rptr, E_String* s, int last);

Subtypes

Behind the E_String facade are five implementations of strings, two of which may be “immediate” as in the M-String proposal if the C compiler will allow it.

atom:
A string of length 1, which cannot be split any further. Ideally there will be at most one of these for every Unicode code point.
empty:
A string of length 0. The runtime will create only one of these.
encoded:
A multi-character string encoded as bytes, with the encoding noted.
utf8:
A multi-character string encoded as UTF-8 bytes.
wide:
A multi-character string composed of Unicode code points.

API

Any Object

TODO

Object Export

TODO

Object ID

TODO

Object Memory Management

TODO

Class and Interface Declaration

TODO

Feature

TODO

Type Reference

TODO

Class and Function Implementation

TODO

Constructors

TODO

Features and Interfaces

TODO

Memory

TODO

Skeleton

TODO

Message

TODO

Message Reply

TODO

Universal Types

Any

TODO

Binary

TODO

Boolean

TODO

Function

TODO

List

TODO

Null

TODO

Number

TODO

Proxy

TODO

Record

TODO

String

TODO

Tuple

TODO

Userdata

TODO


  1. This was meant to be the simpler alternative to M-String, TypeLib, TIDL, and Teufel. It still ended up pretty complicated, mainly because it’s trying to impose an object model on vanilla C. Garbage collection almost pales in comparison. ↩︎

  2. Internally a C_Table will track the reference counts of all objects. No object needs to add a reference count field. In fact this technique may decrease swapping and memory cache misses. ↩︎

  3. Possibly stored in another format like UTF-8. ↩︎

  4. The platform’s optimal size for a C int, usually 32 bits. ↩︎

  5. An approximation of a real number. In C this is an E_Real, usually defined as a double or, if using <math.h>, a double_t. Embedded platforms may redefine this to a fixed point or other type of decimal value. ↩︎

  6. Currently the only generic type is the builtin List. ↩︎