JSON in C

Frank Mitchell

Posted: 2023-04-12
Last Modified: 2023-07-21
Word Count: 853
Tags: c-programming lua programming python ruby

Table of Contents

This project will create a C library to parse and emit JSON. It will then provide wrappers for various languages including Lua, Python, and Ruby.

Design

Since there are so many C JSON parsers already, this one will try to do things a little differently:

  1. It will offer a “pull parser” interface modeled after JSONPP.
  2. The implementation will attempt to avoid allocation and leaks in general.

API

#include <stdbool.h>
#include <stdint.h>

typedef enum {
    SYNTAX_ERROR = 0,
    START_STREAM,   /* initial state before parsing */
    START_ARRAY,    /* read `[` */
    END_ARRAY,      /* read `]` */
    START_OBJECT,   /* read `{` */
    END_OBJECT,     /* read `}` */
    KEY_NAME,       /* read key for Object */
    VALUE_NULL,     /* read `null` */
    VALUE_TRUE,     /* read `true` */
    VALUE_FALSE,    /* read `false` */
    VALUE_NUMBER,   /* read JSON Number */
    VALUE_STRING,   /* read JSON String */
    END_STREAM      /* reached final state */
} Json_Event;

typedef struct _Json_Pull_Parser Json_Pull_Parser;

/**
 * A convenience type for a single byte character.
 * Per the JSON spec, the parser can only parse ASCII characters.
 */
typedef char    byte_t;

/**
 * A convernience type for a byte in a UTF-8 sequence.
 */
typedef uint8_t utf8_t;

/**
 * A callback to fetch more characters from a file, file descriptor, socket,
 * or anything else.
 * `data` is the same pointer passed into `Json_Pull_Parser_new()`.
 * When the function returns, the client should return new bytes to parse, 
 * and set `*sizptr` to the number of bytes pointed to.
 * The parser checks that all characters are in the ASCII range.
 * Buffer management is the client routine's responsibility, perhaps in
 * conjunction with the `data` ptr.
 */
typedef const byte_t* (*Json_Reader)(void* data, size_t *sizptr);

/**
 * Create a new parser at `*pptr`, reading in characters with `r` and `d`.
 */
JSONPP_API
void Json_Pull_Parser_new(Json_Pull_Parser* *pptr, Json_Reader r, void* d);

/**
 * Advance to the next significant token in the input.
 */
JSONPP_API
void Json_Pull_Parser_do_next(Json_Pull_Parser* p);

/**
 * The type of the last event parsed.
 */
JSONPP_API
Json_Event Json_Pull_Parser_event(Json_Pull_Parser* p);

/**
 * Whether the parser is currently processing a JSON Array.
 */
JSONPP_API
bool Json_Pull_Parser_in_array(Json_Pull_Parser* p);

/**
 * Whether the parser is currently processing a JSON Object.
 */
JSONPP_API
bool Json_Pull_Parser_in_object(Json_Pull_Parser* p);

/**
 * The number value of the last event parsed, if that event was VALUE_NUMBER,
 * or NULL otherwise.
 * Clients should copy the value before the next call to 
 * `Json_Pull_Parser_do_next()` or this value will be overwritten.
 */
JSONPP_API
double_t*  Json_Pull_Parser_number(Json_Pull_Parser* p);

/**
 * The string value of the last event parsed, or NULL if the last event
 * was not KEY_NAME, VALUE_STRING, or VALUE_NUMBER.
 * The result converts all Unicode and other escape sequences to UTF-8 bytes.
 * Clients should copy the string before the next call to 
 * `Json_Pull_Parser_do_next()` or this value will be overwritten.
 */
JSONPP_API
const utf8_t* Json_Pull_Parser_string(Json_Pull_Parser* p);

/**
 * Acquire another reference to this parser, so that a call to
 * `Json_Pull_Parser_release()` doesn't immediately destroy the parser.
 */
JSONPP_API
Json_Pull_Parser* Json_Pull_Parser_retain(Json_Pull_Parser* p);

/**
 * End parsing and release the memory held by this parser, if no other
 * client "retains" it.
 */
JSONPP_API
void Json_Pull_Parser_release(Json_Pull_Parser* *pptr);

Wrappers

Wrappers will use native I/O to construct a tree of native equivalents to JSON Arrays and Objects.

Lua

The parser will create a tree of tables:

local jsonpp = require "jsonpp"

local result = jsonpp.parse [[
        {"quote": "THIS IS JSON!"}
    ]]

print(result.quote)  -- => THIS IS JSON!

Or, if the programmer prefers:

local jsonpp = require "jsonpp"

-- `readerfcn` is a function that accepts an arbitrary argument
-- (e.g. 1) to represent state and returns a string and the next state
-- value on every invocation until the last, when it returns nil.

local input <const> = [[
        {"quote": "THIS IS JSON!"}
]]

function readerfcn(state)
    if state > string.len(input) then
        return nil, nil
    end
    local newstate = state + 4
    return string.sub(input, state, newstate-1), newstate
end

local parser = jsonpp.new_parser(readerfcn, 1)

for event, value in parser:iterator() do
    -- `value` is either a string, number, boolean, nil, or error message
    -- depending on the value of `event`
    if event == jsonpp.START_OBJECT then
        -- do something
    elseif event == jsonpp.END_OBJECT then
        -- do something else
    else
        -- etc.
    end
end

Or maybe instead of the for loop …

parser:handle_events {
    start_object = function (parser) do
            -- something useful
        end
    end_object = function (parser) do
            -- something useful
        end
    -- and so on.
}

Or if really going old-school:

local jsonpp = require "jsonpp"

local parser = jsonpp.new_parser([[{"quote": "THIS IS JSON!"}]])

local event, value;

parser:do_next()
event = parser:event()  -- => json.START_STREAM

-- ... and so forth ...

Python

The parser will create a tree of dictionaries and lists.

import jsonpp

result = jsonpp.parse("""
        { "quote": "THIS IS JSON!" }
"""

print result
# Should be `{'quote': 'THIS IS JSON!'}`

Most or all of the Lua alternatives will be available in their own Pythonic idiom.

Ruby

The parser will create a tree of Arrays and Hashes.

require "jsonpp"

result = jsonpp.parse('{"quote": "THIS IS JSON!"}')

puts result.inspect
# Should be `{"quote"=>"THIS IS JSON!"}`

Most or all of the Lua alternatives will be available in accordance with the Ruby Way.