K Polymorphic Binary Format

Purpose

This document defines a binary format for polymorphic values in k.

The semantic object to be serialized is:

(P, v)

where:

The binary format is designed so that:

Design Principles

1. Pattern is the structural authority

The pattern section is the authoritative description of the structure expected by the value section.

It determines:

2. Value is interpreted relative to the pattern

The value payload is not self-describing. It is decoded relative to the pattern graph.

This is why the payload does not need to repeat field names or tag names.

3. (...) is not value-carrying

The leaf pattern (...) is allowed in the pattern language, but it is not allowed to carry serialized value content in version 1 of this format.

This restriction keeps explicit labels out of the value payload. Otherwise the missing structure would have to be reintroduced locally in the value itself.

4. Value DAG is only an optimization

Semantically, the value is a tree. On the wire, it may be encoded as a DAG by collapsing equal occurrences relative to the pattern.

High-Level Structure

The package layout is:

| magic | format_version | flags | symbol_table | pattern_section | value_section |

There is no separate type_section.

Primitive Encodings

Integers

Symbols

A symbol is a UTF-8 string used as:

Symbols are interned in a symbol table and referred to by symbol ID.

| magic:4 bytes | format_version:u8 | flags:u8 |

Current values:

All flag bits are reserved in version 1 and must be zero.

Symbol Table

| symbol_count:uvarint | symbol_0 | ... | symbol_(N-1) |

Each symbol is:

| byte_length:uvarint | utf8_bytes... |

Canonical ordering

Symbols are ordered canonically by:

  1. ascending UTF-8 byte sequence,
  2. duplicates removed.

The same symbol table is used by both the pattern and the value sections.

Pattern Section

The pattern section serializes a normalized rooted pattern graph.

Pattern node kinds

There are exactly five semantic node kinds:

These are constructor classes of pattern nodes, not leaf/internal tags. Only (...) is forced to have no outgoing edges. All other node kinds may have zero or more outgoing edges.

Their meanings are:

In particular:

Canonical node numbering

Pattern nodes are numbered by first discovery in rooted depth-first traversal:

  1. start at the root,
  2. visit outgoing edges in ascending symbol ID order,
  3. assign a new node ID on first encounter,
  4. reuse the existing node ID on revisits.

The root pattern node is always node 0.

Structure

| pattern_node_count:uvarint | pattern_node_0 | ... | pattern_node_(P-1) |

Pattern node record

Every pattern node record starts with:

| node_kind:u8 |

The codes are:

The remainder of every record is:

| edge_count:uvarint | edge_0 | ... | edge_(k-1) |

Each edge is:

| symbol_id:uvarint | target_pattern_node:uvarint |

Edges must be sorted by ascending symbol_id and have no duplicates.

Validation rules for node kinds:

This document therefore starts without type hashes or exact-type leaves. Closed typed values are represented simply as values whose root pattern happens to be singleton.

Value Section

The value section encodes the witness value tree relative to the pattern graph. On the wire, it is represented as a DAG of value occurrences decorated by pattern-node identity.

Canonical value-node identity

The canonical key of a value node is:

(pattern_node_id, node_shape)

where:

This is the criterion for canonical DAG sharing in the value payload.

Encodability restriction

Version 1 forbids serialized values from descending into an any-leaf pattern node.

So if evaluation of a value against a pattern would require materializing a subtree at (...), the value is not encodable in this format.

Canonical value-node numbering

Value nodes are numbered in canonical postorder:

  1. children before parents,
  2. for product nodes, children in ascending pattern-edge symbol order,
  3. for union nodes, the selected child before the parent,
  4. a shared node is emitted only once, at the first completed postorder visit.

The root value node is always the last node.

Structure

| value_node_count:uvarint | value_node_0 | ... | value_node_(V-1) |

Value node record

Each value node starts with:

| pattern_node_id:uvarint | body... |

The body depends on the referenced pattern node kind.

Product value node

If the pattern node is {...} or {} with k outgoing edges in ascending symbol order:

| child_ref_0:uvarint | child_ref_1:uvarint | ... | child_ref_(k-1):uvarint |

The value carries exactly the children of the explicitly listed fields. Open product patterns do not permit the value section to introduce additional unlabeled fields.

Union value node

If the pattern node is <...> or <> with m outgoing edges in ascending symbol order:

| tag_ordinal:uvarint | child_ref:uvarint |

tag_ordinal must be in [0, m). Open union patterns do not permit the value section to introduce additional unnamed tags.

(...) value node

This case is forbidden in version 1.

Child References

Child references are encoded as back-distances:

child_ref = current_value_node_id - 1 - child_value_node_id

So:

The decoder reconstructs:

child_value_node_id = current_value_node_id - 1 - child_ref

Validation

After decoding, a package is valid iff all of the following hold.

Pattern section

Value section

Example

Consider:

pattern: < X tag1, {} tag2, ... > = X
value:   {}|tag2|tag1|tag1

Symbols

0 -> "tag1"
1 -> "tag2"

Pattern graph

P0: <...>, edges(tag1 -> P0, tag2 -> P1)
P1: {},    no edges
root = P0

Value DAG

V0: {}            at P1
V1: V0 | tag2     at P0
V2: V1 | tag1     at P0
V3: V2 | tag1     at P0
root = V3

This example has no repeated decorated subtrees, so the value DAG is the same as the value tree.

Closed-Value Specialization

If the root pattern is singleton, then the whole package represents an ordinary closed typed value.

In that case:

So the closed-value format is a derived optimization, not the semantic foundation.

Operational Properties

The value payload is chosen to support efficient k navigation.

.label

For a product value node:

  1. inspect the referenced pattern node,
  2. map label to field ordinal by the pattern edge order,
  3. jump to the corresponding child reference.

/tag

For a union value node:

  1. inspect the referenced pattern node,
  2. compare the stored tag_ordinal with the requested tag ordinal,
  3. on success follow the child reference.

So field and tag names stay in the pattern graph, not in the value payload.

Future Extensions

Possible future extensions include: