Exported Objects1 (ExO) allows programmers to define “objects” in C and use them in any scripting language or environment with an ExO binding. Initially that will include Lua, Python, Ruby, and a standalone runtime similar to Objective-C usable from plain C.
Concepts
In this document, we use the following definitions:
- Class
- Informally, all objects with the same structure and functions. Formally:
- An
E_Class
structure that records the constructor, destructor, and other functions that instances respond to. - The set of Objects with the same
E_Class
.
- An
- Domain
- A region of memory containing interconnected objects (q.v.) Exported Objects divides a system into multiple domains, each lying in another process, thread-local (or global) memory, or the memory used by an interpreter or compiled runtime.
- export
- To provide an object(q.v.) to another Domain (q.v.), either explcitly
via
E_Domain_export(name, obj)
or implicitly by returning it as a result from a function. - object
- any structure with private state and functions to manipulate that state.
- Object (capital-O)
- an object that is registered as an instance of a Class.
Domains
All Exported Objects live in a “domain”. Environments that define domain boundaries include:
- Processes
- Threads
- Language Runtimes (e.g. Lua, Python, Ruby)
- Memory Zones
Objects communicate across boundaries through the ExO runtime, through a lowest-common denominator of (mostly) immutable value types: arrays, booleans, nulls, numbers, strings, structs, and symbols, as well as “userdata” (opaque values exposed from other domains) and Classes, Functions, and Objects made available from other domains.
Memory and Reference Counting
All objects will use reference counting exclusively. The following functions comprise the new reference counting API2:
void* E_Any_retain(void* *obj);
void* E_Any_release(void* *obj);
void* E_Any_set(void* *obj, void* newobj);
Objects need only register a “free” procedure with their Class that looks something like this:
void My_Object_free(My_Object* obj) {
E_Any_set(obj->some_ref, NULL);
E_Any_set(obj->another_ref, NULL);
}
This will release any objects held by My_Object
.
Note that circular references pose a particular problem for exclusively
reference-counted schemes. One technique is for child nodes in a tree
not to add a reference to their parent, for example.
If an object has some resource like a file handle that it should clean up when it’s deleted, use this call:
E_Any_set_cleaner(self, self->rc, &cleaner);
Note that self
will have already been deallocated when cleaner
is called,
so the function must use the resource to be cleaned as an argument.
Message Dispatch
Notionally Objects and Functions in different domains don’t call each others' functions directly. They send messages containing a domain destination (typically a number or URL), a message name or symbol, and any necessary arguments including the OIDs of a receiver Object. If the message is understood, the receiver executes its function and passes the results back as a Tuple
void E_Message_new(C_Symbol msg);
void E_Message_new_with_name(const char* msgname);
void E_Message_set_receiver(E_ID* obj);
void E_Message_set_argument(int index, E_Any* arg);
If the message is an Object feature,
the receiver is the domain-specific E_ID
of the receiver of the message.
If the message is a Class constructor,
the receiver is the Symbol for the class name.
If the message is a Function call, the receiver is ignored.
The remaining arguments are arguments to the feature, constructor, or function.
void E_Message_send(E_Domain* d, E_Message* m, E_Reply* *rptr);
rptr
points to a E_Reply*
variable.
If set to NULL E_Message_send
will return immediately and ignore any reply.
Otherwise, the reply will contain one of the following:
- RESULT: an
E_Tuple
orE_Record
containing the domain’s response to the message. It may be empty, indicating the message completed normally but its signature contains no reply parameters. - ERROR: a Record (error: Integer, message: String, data: Any) explaining why the message was not understood or acted upon.
Meta-Object Protocol
The runtime of each Domain and the Exported Objects system contains a lot of information about each object. It uses this information to create, manage, and delete these objects, as described below.
Creating Objects
Using object-based C, one would only need to write:
E_Char_Buffer_new(&buf)
Unfortunately ExO needs to do a few additional steps:
- Determine the
E_Class
of an object. - From the
E_Class
, determine its “free” procedure. - Set an instance’s initial reference count to 1.
Fortunately they’re all accomplished with the following call:
E_Class_create(&buf, "Char_Buffer", "new");
E_Class_new
takes variable arguments which it will interpret according to
the signature established for the New method.
Defining Interfaces
Absent some handy tool defining an interface will require something like this:
void Char_Sequence_class_init() {
E_Type_Code T_Char_Sequence = E_Type_code("interface", "Char_Sequence");
E_Type_Code T_int = E_Type_code("int");
E_Type_Code T_char = E_Type_Code("char");
E_Type_Code T_bool = E_Type_Code("boolean");
E_Type_Code T_char_array = E_Type_Code("array", "char");
E_Interface_Definition Char_Sequence_Defn = {
"Char_Sequence", /* =: name */
{}, /* =: extends */
{ /* =: features */
{"char_at", /* =: feature name */
{ T_int }, /* =: feature arguments */
{ T_char } /* =: feature results */
},
{"length", {}, {T_int}},
{"slice", {T_int, T_int}, {T_Char_Sequence, T_bool}},
{"slice_from", {T_int}, {T_Char_Sequence, T_bool}},
{"slice_to", {T_int}, {T_Char_Sequence, T_bool}},
{"to_chars", {} {T_char_array}}
}
};
E_Interface_define(&Char_Sequence_Defn);
}
define_interface()
will load the struct contents into an internal
data structure, so implementers can create them on the stack.
Implementing Classes
Absent some handy tool implementing a class will require something like this:
E_Class* c = E_Class_define("Char_Buffer");
E_Class_implements(c, "Char_Sequence");
E_Class_define_size(c, sizeof(struct _Char_Buffer));
E_Class_define_init(c, &_skel_Char_Buffer_new);
E_Class_define_free(c, &_skel_Char_Buffer_del);
E_Class_define_constructor(c, "new",
&_skel_Char_Buffer_new,
0);
E_Class_define_constructor(c, "new_copy",
&_skel_Char_Buffer_new_copy,
1,
E_Type_code("interface", "Char_Sequence"));
E_Class_define_constructor(c, "new_with_chars",
&_skel_Char_Buffer_new_copy,
1,
E_Type_code("array", "char"));
E_Class_define_feature(c, "set_char_at", &skel_Char_Buffer_set_char_at,
2,
E_Type_code_in("int"),
E_Type_code_in("char"));
E_Class_define_feature(c, "to_string", &skel_Char_Buffer_to_string,
1
E_Type_code_out("interface", "String"));
E_Class_implement_feature(c, "char_at", &_skel_Char_Buffer_char_at);
/*
* or if code generation or libffi is supported ...
*
* E_Type_Map map_Char_Buffer_char_at = {
* // C arguments
* {
* // ExO index, ExO type, C type name, C size
* {
* 0,
* "Char_Buffer",
* "E_Char_Buffer*",
* sizeof(void *)
* },
* { 1, "char", "wchar_t", sizeof(wchar_t) }
* },
* // C return type
* { 2, "int", "int", sizeof(int) }
* };
*
* E_Class_define_routine_ffi(c, "char_at", Char_Buffer_char_at,
* map_Char_Buffer_char_at);
*/
/* etc. etc. */
Implementing Functions
Functions, including interface features and class constructors, may take zero or more arguments and return zero or more results.
r1, r2, ... = f(arg1, arg2, ...)
As C does not allow this, function implementations need a “skeleton” wrapper with the following signature:
typedef void (*E_Function)(void *ud, E_Frame* c);
FFI or code generation would remove the need to write these functions explicitly.
Cross-Domain Types
Data Types
These are the most important data types in ExO.
ExO Type | C type | Description |
---|---|---|
Any | E_Any* |
the parent type of all possible types |
Binary | E_Binary* |
an arbitrary length collection of bits |
Boolean | bool |
true or false |
Function | see below | a capsule of code with arguments and results |
List[T] | E_List* |
an ordered, immutable sequence of T |
Null | NULL |
a value with no behavior |
Number | E_Number* |
a continuous, unit-less, one-dimensional quantity |
Object | see below | an entity with behavior and state |
Record | E_Record* |
an immutable set of fields referenced by Symbol |
String | E_String* |
an immutable sequence of Unicode code points.3 |
Symbol | C_Symbol |
a unique value sometimes linked to a string |
Tuple | E_Tuple* |
an immutable set of fields referenced by index |
Userdata | E_Userdata* |
a non-Object reference from this domain |
ExO types have analogues in other languages:
ExO Type | JSON/JavaScript | Lua | Python | Ruby |
---|---|---|---|---|
Any | (“value”) | Object | ||
Binary | (number) | integer | BigInt | |
Boolean | Boolean | boolean | boolean | Boolean |
Function | Function | function | function | Proc |
List | Array | table | list | Array |
Null | Null | nil |
NONE |
nil |
Number | Number | number | number | Number |
Object | Object | table | instance | Object |
Record | Object | table | Struct | |
String | String | string | string | String |
Symbol | Symbol | (string) | dictionary | Hash |
Userdata | userdata | (???) |
Numeric Data Subtypes
Types come from <stdint.h>
, <math.h>
, and <wstring.h>
.
Not all types will be available on all platforms.
Type | C type(s) | Description |
---|---|---|
bigint | E_Big_Integer |
an arbitrary length integer |
bit | uint8_t : 1 |
0 or 1 |
char | wchar_t , utf32_t |
a Unicode code point |
int | int |
signed integer of at least 16 bits4 |
int16 | int64_t |
16-bit signed integer |
int32 | int32_t |
32-bit signed integer |
int64 | int64_t |
64-bit signed integer |
octet | uint8_t , octet_t |
an unsigned 8-bit value |
real | E_Real |
a non-integral number5 |
uint16 | uint64_t |
16-bit signed integer |
uint32 | uint32_t |
32-bit signed integer |
uint64 | uint64_t |
64-bit signed integer |
Object-Related Types
R | Type | C type(s) | Description |
---|---|---|---|
X | Class | E_Class |
a description of an Object’s features |
Feature | E_Feature |
the prototype of messages an object responds to and the expected return type | |
X | Function | ||
X | - Exported | E_Function |
an exported function in the current domain |
? | - Local | various | an un-exported function in the current domain |
X | - Remote | E_Proxy |
a function in another domain |
X | Identifier | E_ID |
a specifier for an object or function within a domain. |
Interface | E_Interface |
a named collection of features | |
Message | E_Message |
a Symbol and list of arguments | |
Message Reply | E_Reply |
a list of results from a Message | |
X | Object | ||
X | - Exported | various | an exported object in the current domain |
? | - Local | various | an un-exported object in the current domain |
X | - Remote | E_Proxy |
an object in another domain |
Type | E_Type |
the name of a Data Type, Class, or Interface | |
Type Parameter | E_Type_Param |
a type variable bound to a concrete type | |
Type Variable | E_Type_Var |
a variable in lieu of a concrete type6 |
R indicates this type can be the receiever of a message.
String
The initial proof of concept will be E_String
, an encapsulated String
usable with object-based C functions and Exported Object APIs alike.
The E-String API will resemble the M-String API with the following changes:
-
Any function that returns an M-String will instead return
void
and take a pointer to an E-String pointer as a first argument.
The pointer written to that argument will be automatically retained. -
There will be no equivalents to an
M_Pool
,M_Root_Set
, orM_Scope
. -
M_Char_Buffer
andM_String_Array
will be replaced byE_Char_Buffer
andE_String_Array
, respectively.
So, for example, these are the constructors:
void E_String_new_ascii(E_String* *sptr, size_t sz, const char* buf);
void E_String_new_utf8(E_String* *sptr, size_t sz, const utf8_t* buf);
void E_String_new_utf16(E_String* *sptr, size_t sz, const utf16_t* buf);
void E_String_new_utf32(E_String* *sptr, size_t sz, const wchar_t* buf);
void E_String_new_encoded(E_String* *sptr,
C_Symbol encoding,
size_t sz,
const octet_t* buf);
/* Create from a null-terminated string */
void E_String_new_from_cstring(E_String* *sptr, const char* cstr);
And these are the slice and join functions:
void E_String_join(E_String* *rptr, E_String* head, E_String* tail);
void E_String_join_n(E_String* *rptr, size_t n, ...);
void E_String_slice(E_String* *rptr, E_String* s, int first, int last);
void E_String_slice_from(E_String* *rptr, E_String* s, int first);
void E_String_slice_to(E_String* *rptr, E_String* s, int last);
Subtypes
Behind the E_String
facade are five implementations of strings,
two of which may be “immediate” as in the
M-String proposal if the C compiler
will allow it.
- atom:
- A string of length 1, which cannot be split any further. Ideally there will be at most one of these for every Unicode code point.
- empty:
- A string of length 0. The runtime will create only one of these.
- encoded:
- A multi-character string encoded as bytes, with the encoding noted.
- utf8:
- A multi-character string encoded as UTF-8 bytes.
- wide:
- A multi-character string composed of Unicode code points.
XIDL
Taking some inspiration from XPCOM, one could introduce an Interface Definition Language to more easily define interfaces, declare classes, and generate function skeletons. A tool could even take an IDL representation, parse the C header files for a library, and output a complete wrapper for that library.
Informal Syntax
Rather than mimic CORBA IDL or C/C++, XIDL would represent how functions really operate in ExO. For example, here’s one possible function to get a substring from a string.
function substring(s: String, start: int, stop: int)
=> (ss: String, indexes_ok: boolean)
In the ExO runtime functions can return multiple results.7
Here the second result returns whether the indexes indicated a proper substring
or whether start
indicated a character after stop
.
In cases where there’s only one return value one can use this syntax:
function substring(s: String, start: integer, stop: integer) => String
If there’s no return value, a function (typically called a procedure) can simply omit an arrow entirely:
function set_char_at(s: Char_Buffer, index: int, value: char)
A small but complete XIDL file might look like this:
module exo.util.string is
interface Char_Sequence is
char_at(i: int) => char
length => integer
slice(start: int, stop: int)
=> (result: Char_Sequence, indexes_ok: boolean)
slice_from(start: int)
=> (result: Char_Sequence, indexes_ok: boolean)
slice_to(stop: int)
=> (result: Char_Sequence, indexes_ok: boolean)
to_chars: [char]
end
class Char_Buffer : Char_Sequence is
+ new
+ new_copy(s: Char_Sequence)
+ new_with_chars(wcs: [char])
- append(c: Char_Sequence)
- append_all(cs: [Char_Sequence])
- set_char_at(i: int, value: char)
- to_string => String
end
class String : Char_Sequence is
+ empty
+ for_char(c: char)
+ new(s: Char_Sequence)
+ new_with_ascii(b: [octet])
+ new_with_chars(wcs: [char])
+ new_with_encoding(e: Symbol, b: [octet])
+ new_with_utf8(b: [octet])
- encoding => symbol
end
function compare(a: Char_Sequence?, b: Char_Sequence?) => int
function join(a: Char_Sequence, b: Char_Sequence) => String
function join_all(args: [Char_Sequence]) => String
end
[
x]
means ‘List of` x.
The syntax to specify Records and Tuples is to be determined.
Note that all “features” of an interface have their receiver
as an implicit first argument.
Constructors all have the implicit return value of a class instance.
Also, no argument may be null
unless the type is marked with a ?
.
Formal Syntax
TODO
API
Any Object
TODO
Object Export
TODO
Object ID
TODO
Object Memory Management
TODO
Class and Interface Declaration
TODO
Feature
TODO
Type Reference
TODO
Class and Function Implementation
TODO
Constructors
TODO
Features and Interfaces
TODO
Memory
TODO
Skeleton
TODO
Message
TODO
Message Reply
TODO
Universal Types
Any
TODO
Binary
TODO
Boolean
TODO
Function
TODO
List
TODO
Null
TODO
Number
TODO
Proxy
TODO
Record
TODO
String
TODO
Tuple
TODO
Userdata
TODO
-
This was meant to be the simpler alternative to M-String, TypeLib, TIDL, and Teufel. It still ended up pretty complicated, mainly because it’s trying to impose an object model on vanilla C. Garbage collection almost pales in comparison. ↩︎
-
Internally a
C_Table
will track the reference counts of all objects. No object needs to add a reference count field. In fact this technique may decrease swapping and memory cache misses. ↩︎ -
Possibly stored in another format like UTF-8. ↩︎
-
The platform’s optimal size for a C
int
, usually 32 bits. ↩︎ -
An approximation of a real number. In C this is an
E_Real
, usually defined as adouble
or, if using<math.h>
, adouble_t
. Embedded platforms may redefine this to a fixed point or other type of decimal value. ↩︎ -
Currently the only generic type is the builtin List. ↩︎
-
CORBA and similar C-like syntaxes designate parameters as in, out, or inout, the latter two being translated to pointers to the data being read or written. ↩︎