This document defines a binary format for polymorphic values in
k.
The semantic object to be serialized is:
(P, v)
where:
P is a normalized pattern graph,v is a typed value compatible with P.The binary format is designed so that:
.label and
/tag,The pattern section is the authoritative description of the structure expected by the value section.
It determines:
The value payload is not self-describing. It is decoded relative to the pattern graph.
This is why the payload does not need to repeat field names or tag names.
(...) is not
value-carryingThe leaf pattern (...) is allowed in the pattern
language, but it is not allowed to carry serialized
value content in version 1 of this format.
This restriction keeps explicit labels out of the value payload. Otherwise the missing structure would have to be reintroduced locally in the value itself.
Semantically, the value is a tree. On the wire, it may be encoded as a DAG by collapsing equal occurrences relative to the pattern.
The package layout is:
| magic | format_version | flags | symbol_table | pattern_section | value_section |
There is no separate type_section.
uvarint.A symbol is a UTF-8 string used as:
Symbols are interned in a symbol table and referred to by symbol ID.
| magic:4 bytes | format_version:u8 | flags:u8 |
Current values:
magic = "KPV2"format_version = 1All flag bits are reserved in version 1 and must be zero.
| symbol_count:uvarint | symbol_0 | ... | symbol_(N-1) |
Each symbol is:
| byte_length:uvarint | utf8_bytes... |
Symbols are ordered canonically by:
The same symbol table is used by both the pattern and the value sections.
The pattern section serializes a normalized rooted pattern graph.
There are exactly five semantic node kinds:
(...){...}<...>{}<>These are constructor classes of pattern nodes, not leaf/internal
tags. Only (...) is forced to have no outgoing edges. All
other node kinds may have zero or more outgoing edges.
Their meanings are:
(...) means unconstrained type,
{...} with edges
l1 -> X1, ..., ln -> Xn means
{ X1 l1, ..., Xn ln, ... }<...> with edges
t1 -> X1, ..., tn -> Xn means
< X1 t1, ..., Xn tn, ... >{} with edges
l1 -> X1, ..., ln -> Xn means
{ X1 l1, ..., Xn ln }<> with edges
t1 -> X1, ..., tn -> Xn means
< X1 t1, ..., Xn tn >In particular:
{} with zero edges is the unit type,<> with zero edges is the empty type,{...} with zero edges matches any product type,<...> with zero edges matches any union
type.Pattern nodes are numbered by first discovery in rooted depth-first traversal:
The root pattern node is always node 0.
| pattern_node_count:uvarint | pattern_node_0 | ... | pattern_node_(P-1) |
Every pattern node record starts with:
| node_kind:u8 |
The codes are:
0 = (...)1 = {...}2 = <...>3 = {}4 = <>The remainder of every record is:
| edge_count:uvarint | edge_0 | ... | edge_(k-1) |
Each edge is:
| symbol_id:uvarint | target_pattern_node:uvarint |
Edges must be sorted by ascending symbol_id and have no
duplicates.
Validation rules for node kinds:
(...) must have edge_count = 0,edge_count ≥ 0.This document therefore starts without type hashes or exact-type leaves. Closed typed values are represented simply as values whose root pattern happens to be singleton.
The value section encodes the witness value tree relative to the pattern graph. On the wire, it is represented as a DAG of value occurrences decorated by pattern-node identity.
The canonical key of a value node is:
(pattern_node_id, node_shape)
where:
{...} or {}),
node_shape is the ordered list of child node IDs,<...> or
<>), node_shape is
(tag_ordinal, child_node_id),(...) carries no value and therefore cannot appear in
the value section.This is the criterion for canonical DAG sharing in the value payload.
Version 1 forbids serialized values from descending into an
any-leaf pattern node.
So if evaluation of a value against a pattern would require
materializing a subtree at (...), the value is not
encodable in this format.
Value nodes are numbered in canonical postorder:
The root value node is always the last node.
| value_node_count:uvarint | value_node_0 | ... | value_node_(V-1) |
Each value node starts with:
| pattern_node_id:uvarint | body... |
The body depends on the referenced pattern node kind.
If the pattern node is {...} or {} with
k outgoing edges in ascending symbol order:
| child_ref_0:uvarint | child_ref_1:uvarint | ... | child_ref_(k-1):uvarint |
The value carries exactly the children of the explicitly listed fields. Open product patterns do not permit the value section to introduce additional unlabeled fields.
If the pattern node is <...> or
<> with m outgoing edges in ascending
symbol order:
| tag_ordinal:uvarint | child_ref:uvarint |
tag_ordinal must be in [0, m). Open union
patterns do not permit the value section to introduce additional unnamed
tags.
(...) value nodeThis case is forbidden in version 1.
Child references are encoded as back-distances:
child_ref = current_value_node_id - 1 - child_value_node_id
So:
0 means the immediately preceding node,1 means two records back,The decoder reconstructs:
child_value_node_id = current_value_node_id - 1 - child_ref
After decoding, a package is valid iff all of the following hold.
(...) has no outgoing edges,pattern_node_id is in range,{...} or
{} pattern node,<...> or
<> pattern node,(...),tag_ordinal is valid for the referenced union pattern
node,0.<> with zero outgoing
edges, since the empty type has no values.Consider:
pattern: < X tag1, {} tag2, ... > = X
value: {}|tag2|tag1|tag1
0 -> "tag1"
1 -> "tag2"
P0: <...>, edges(tag1 -> P0, tag2 -> P1)
P1: {}, no edges
root = P0
V0: {} at P1
V1: V0 | tag2 at P0
V2: V1 | tag1 at P0
V3: V2 | tag1 at P0
root = V3
This example has no repeated decorated subtrees, so the value DAG is the same as the value tree.
If the root pattern is singleton, then the whole package represents an ordinary closed typed value.
In that case:
So the closed-value format is a derived optimization, not the semantic foundation.
The value payload is chosen to support efficient k
navigation.
.labelFor a product value node:
label to field ordinal by the pattern edge
order,/tagFor a union value node:
tag_ordinal with the requested tag
ordinal,So field and tag names stay in the pattern graph, not in the value payload.
Possible future extensions include:
(...),