CSymbol

Frank Mitchell

Posted: 2023-03-20
Word Count: 420
Tags: c-programming programming vaporware

To represent character encoding names in M-Strings I created a small API inspired by JavaScript Symbols. After some thought I realized it was only loosely related to M-Strings and could be useful in other contexts.

Symbols

Symbols are unique, interned values. They’re used for significant and often-used strings like character encoding names, HTTP method names, function and variable names, etc.

Symbols cannot be used like C strings. Instead use C_Symbol_as_cstring(sym).

API

#include <stdbool.h>
#include <syst/types.h>

typedef unsigned char utf8_t;

/**
 * A unique value sometimes tied to a string value.
 * The mapping to strings, if any, resides in a thread-safe global hashtable.
 */
typedef struct _C_Symbol* C_Symbol;

/**
 * Determines whether `p` is a C_Symbol.
 * The implementation checks whether `p` is in the right memory range instead
 * of dereferencing it, in case it points to an invalid memory location.
 */
bool is_C_Symbol(void* p);

/**
 * Return a new Symbol value unique in the current instance of the host program.
 * The value isn't guaranteed to be unique across all parallel or future
 * instances in memory.
 * The Symbol has no corresponding string value.
 */
C_Symbol C_Symbol_new();

/**
 * Return a Symbol value unique in current memory, indexed by `cstr`.
 * Since `cstr` is a C string, it cannot contain embedded nulls.
 * By convention, symbols don't contain whitespace or non-printable characters,
 * but this is not a hard and fast rule.
 */
C_Symbol C_Symbol_for_cstring(const char* cstr);

/**
 * Return a Symbol value unique in current memory, 
 * indexed by a UTF-8 string of length 'len' starting at `uptr`.
 * The string may contain embedded nulls.
 * By convention, symbols don't contain whitespace or non-printable characters,
 * but this is not a hard and fast rule.
 */
C_Symbol C_Symbol_for_utf8_string(size_t len, const utf8_t* uptr);

/**
 * The value of the string used to create the symbol.
 * This is a copy of the string passed into `C_Symbol_for_cstring`
 * or `C_Symbol_for_utf8_string` with embedded nulls replaced with '\xC0\x80',
 * "" if the symbol was created by `C_Symbol_new`,
 * or NULL if the symbol value is invalid (e.g. another type of object).
 */
const char* C_Symbol_as_cstring(C_Symbol sym);

/**
 * Copy the first `len` bytes of the string used to create `sym` into `buf`.
 * The string may contain embedded nulls.
 * The return value will be the number of UTF-8 bytes written to `buf`;
 * if 0 the symbol has no corresponding string value,
 * and if negative `sym` is not a C_Symbol.
 */
ssize_t C_Symbol_as_utf8_string(C_Symbol sym, size_t len, utf8_t* buf);