K Binary Interchange Format Specification

This document describes the closed-value specialization of the more general polymorphic package format defined in POLYMORPHIC_BINARY_FORMAT.md.

Semantically, this file covers the singleton-pattern case:

(Pat(T), (T, t))

where the explicit pattern graph is omitted because it is uniquely determined by the witness type T.

Equivalently, this is the codec used at an exact-type-leaf boundary in the polymorphic format.

So this document does not define a separate semantic object. It defines the compact binary representation of the already-understood object:

(Pat(T), (T, t))

with the following optimization:

Overview

The binary format for k values consists of two parts:

  1. Type header: Identifies the canonical type of the value
  2. Value payload: canonical encoding of the value’s minimal typed DAG

The payload is designed to be friendly to the two primitive k navigations:

This pushes the format away from a pure inline bitstream tree and toward a node-table encoding of the minimal typed DAG. The type carries labels and tag names; the payload carries only compact structural references.

Relation To The Polymorphic Format

The general polymorphic package serializes:

pattern_section + value_section

where pattern nodes may include exact-type-leaf.

This file is the specialization obtained when:

Operationally:

Format Structure

┌─────────────────┬────────────────────────────────┐
│  Type Header    │     Value Payload              │
│  (variable)     │     (variable)                 │
└─────────────────┴────────────────────────────────┘

Type Header

The type is identified by its canonical name (content-addressed hash), which is a base58-encoded SHA-256 hash prefixed with @.

Format Options:

Option A: Length-prefixed ASCII

[1 byte: length N][N bytes: ASCII type hash including '@']

Example: \x2d@NiDZqYggx3VZ6b8quBZKTfkgJztWctkesuX4CrhTxM5c

Option B: Fixed 32-byte hash

[32 bytes: raw SHA-256 hash]

No ‘@’ prefix, no base58 encoding - pure binary.

Option C: No header (rely on context)

[Value payload only]

The decoder must know the expected type from context.

Decision: Option B - Fixed 32-byte raw hash

In the polymorphic setting, this corresponds to omitting the explicit singleton pattern and naming the exact root type directly.

Value Payload (Canonical Serialization)

The source semantic object is always a typed tree. For compact interchange, the wire format may encode the tree through its unique minimal DAG quotient.

Two subtrees are shareable only if they are equal as typed values:

Equality of local payload bytes alone is not enough. In particular, two subtrees that happen to serialize to the same bit pattern under different types must not be merged.

After quotienting by typed-value equality, each DAG node is serialized according to its canonical type state:

  1. Product nodes store child references in canonical field order
  2. Union nodes store the chosen tag ordinal, then one child reference
  3. Unit (empty product) stores no children
  4. Empty union cannot be encoded

The wire format is determined by the minimal typed DAG itself, not by encoder heuristics such as “share only large nodes” or “share whichever duplicate was found first”.

For values with no repeated typed subtrees, the DAG degenerates to the original tree.

This is the closed-type analogue of the polymorphic rule that value sharing is computed relative to the structural authority above it. Here that authority is the exact root type rather than an explicit pattern graph.

Canonical Payload Layout

The payload after the root type hash is:

| payload_version:u8 | node_count:uvarint | node_record_0 | ... | node_record_(N-1) |

Conventions:

The payload is therefore compact, sequentially parseable, and canonically ordered.

Why root = last node

Node records are emitted in canonical postorder:

  1. children before parents,
  2. product children in canonical field order,
  3. union nodes visit their single selected child,
  4. a shared node is emitted only on its first completed postorder visit.

This yields:

Canonical Node Identity

The canonical typed-DAG node key is:

(state_id, node_shape)

where:

Equivalently, an encoder may compute a bottom-up semantic hash from:

hash(state_id, tag_ordinal_or_product, hash(child_0), hash(child_1), ...)

but the semantic rule is normative, not the hash function.

Node Record Layout

Every node record starts with:

| state_id:uvarint | body... |

The decoder uses state_id together with the root type’s canonical automaton to know whether this is a product or union state, how many fields it has, and what child states are expected.

Product Record

If state_id is a product state with fields

f0, f1, ..., f(k-1)

in canonical field order, the record body is:

| child_ref_0:uvarint | child_ref_1:uvarint | ... | child_ref_(k-1):uvarint |

There are no field names in the payload. Labels come entirely from the canonical type.

Union Record

If state_id is a union state with variants

t0, t1, ..., t(m-1)

in canonical tag order, the record body is:

| tag_ordinal:uvarint | child_ref:uvarint |

tag_ordinal must be in [0, m).

There are no tag names in the payload. Tag names come entirely from the canonical type.

Unit Record

If state_id is an empty product state {}, the record body is empty.

Its full encoding is just:

| state_id:uvarint |

Child References

Child references are encoded as back-distances rather than absolute node IDs.

For a current node with ID i and a child with ID j, where j < i, encode:

child_ref = i - 1 - j

Therefore:

The decoder reconstructs:

j = i - 1 - child_ref

This is compact because canonical postorder tends to place children near their parents, especially for non-shared subtrees.

Complete Format

| root_type_hash:32 bytes | payload_version:u8 | node_count:uvarint | node records... |

The root type hash identifies the canonical type automaton used to interpret all state_id values and all product-field / union-tag ordinals.

No separate pattern section is needed because the omitted pattern is exactly the singleton pattern Pat(T) induced by that root type.

Examples

Example 1: Unit type {}

Binary:

[32 bytes: root_type_hash($unit)]
[1 byte: payload_version = 1]
[uvarint: node_count = 1]
[uvarint: state_id(unit)]

There is one node, and because it is the last node, it is also the root.

Example 2: Boolean

Canonical form assigns ordinals: false=0, true=1

Binary:

[32 bytes: root_type_hash($bool)]
[1 byte: payload_version = 1]
[uvarint: node_count = 2]

node 0:
  [uvarint: state_id(unit)]

node 1:
  [uvarint: state_id(bool)]
  [uvarint: tag_ordinal(true) = 1]
  [uvarint: child_ref = 0]   // points to node 0

There are no repeated typed subtrees here, so the minimal DAG is the same as the tree.

Example 3: Natural number

Variant ordinals: zero=0, succ=1

Binary:

[32 bytes: root_type_hash($nat)]
[1 byte: payload_version = 1]
[uvarint: node_count = 4]

node 0:
  [uvarint: state_id(unit)]

node 1:
  [uvarint: state_id(nat)]
  [uvarint: tag_ordinal(zero) = 0]
  [uvarint: child_ref = 0]   // points to node 0

node 2:
  [uvarint: state_id(nat)]
  [uvarint: tag_ordinal(succ) = 1]
  [uvarint: child_ref = 0]   // points to node 1

node 3:
  [uvarint: state_id(nat)]
  [uvarint: tag_ordinal(succ) = 1]
  [uvarint: child_ref = 0]   // points to node 2

The root is node 3. Again, this example has no repeated typed subtrees, so only ordinary parent-to-child back-distances are used.

Parsing and Validation

The decoder MUST:

  1. Read the 32-byte root type hash.
  2. Resolve the canonical type automaton for that hash.
  3. Read payload_version and reject unknown mandatory versions.
  4. Read node_count.
  5. Parse records in node-ID order 0..N-1.
  6. For each record:
  7. Treat node N-1 as the root node.

Malformed payloads include:

Operation Friendliness

This payload is chosen so that a runtime can build an offset table in one linear scan and then execute . and / cheaply.

.label

Given a pointer (node_id, state_id) to a product node:

  1. Look up label -> field_index in the canonical type state.
  2. Jump to the field_index-th child reference in the node record.
  3. Follow the child reference to child node j.
  4. The child state is already determined by the type edge.

No sibling scanning by label is needed.

/tag

Given a pointer (node_id, state_id) to a union node:

  1. Read tag_ordinal from the node record.
  2. Compare with the requested tag’s canonical ordinal.
  3. On match, follow the single child reference.
  4. On mismatch, fail immediately.

The discriminator is therefore available before touching the child payload.

Implementation Requirements

Encoder Interface

/**
 * Encode a k value to binary format
 * @param {Object} value - k value (Product/Variant from Value.mjs)
 * @param {string} typeName - Canonical type name (e.g., "@ABC123...")
 * @param {Object} typeDef - Type definition from compiler
 * @returns {Buffer} Binary representation
 */
function encode(value, typeName, typeDef);

Decoder Interface

/**
 * Decode a k value from binary format
 * @param {Buffer} buffer - Binary data
 * @returns {Object} {typeName: string, value: Value}
 */
function decode(buffer);

Extension: Streaming Format

For streaming multiple values:

┌─────────────┬──────────┬─────────────┬──────────┬───
│  32B hash   │  payload │  32B hash   │  payload │ ...
└─────────────┴──────────┴─────────────┴──────────┴───

Each value is independently parseable.

Notes