Patterns And Typed Values

Abstract

This note proposes a semantic foundation for k patterns, types, typed values, and polymorphic values. The central point is that a value in k is not a bare tree: it exists only in the context of a type. Accordingly, a pattern denotes a set of types, and therefore also a set of typed values, not a set of raw trees. This distinction is essential for reasoning about equality, polymorphism, and binary serialization with DAG compression.

1. Introduction

The k language manipulates algebraic values built from products and unions. At first sight one may be tempted to define a value simply as a rooted labeled tree and then define types as sets of such trees. For k, this is not the right semantic level.

Two trees with the same shape but belonging to different types are different values. Type is part of value identity.

This leads to the following semantic ladder:

The rest of this note makes these notions precise.

2. Types

A type graph is a finite rooted directed graph with labeled edges. Each node is one of:

For a product node, outgoing edge labels are field names. For a union node, outgoing edge labels are tags. Edges are locally unique by label.

The graph may be cyclic, which allows recursive types.

Two type graphs that are equivalent as regular tree languages may be identified. For types, minimization by bisimulation is appropriate.

2.1. Trees accepted by a type

Let Tree(T) denote the set of rooted value trees accepted by a type graph T. Acceptance is defined in the standard way:

3. Typed Values

A typed value is a pair

(T, t)

where:

Thus a value is intrinsically typed. The tree t is only a representation of the value; it is not, by itself, a semantic value of the language.

We write:

Val(T) = { (T, t) : t ∈ Tree(T) }.

3.1. Identity of typed values

Typed values are intensional with respect to the type:

(T, t) ≠ (U, t)    whenever T ≠ U.

Even when the same raw tree t is accepted by both T and U, the two typed values are different.

This point is essential for serialization: the identity of a value cannot be recovered from the raw tree alone.

4. Patterns

A pattern graph is a finite rooted directed graph with labeled edges. Each node carries:

For product and union nodes, outgoing edges are labeled by field names or tags. Edges are locally unique by label.

Unlike the case of types, node identity in a pattern graph is semantic. If two edges point to the same pattern node, this means that the corresponding positions must be inhabited by the very same type.

Therefore patterns are not quotiented by bisimulation in general.

5. Filters As Syntax

A filter expression is a textual way of describing a pattern graph.

Variables appearing in filters are not the semantic object itself. They are a device for serializing and reusing pattern graph nodes in syntax.

For example, the two filters

{ (...) x, (...) y, ... }
{ X x, X y, ... }

describe different pattern graphs:

These two patterns are semantically different.

6. Matching A Type Against A Pattern

Let P be a pattern graph and T a type graph. We say that T matches P, written

T ⊨ P,

if there exists a root-preserving graph morphism

h : nodes(P) → nodes(T)

such that:

  1. if p is unknown, then h(p) may be any type node,

  2. if p is product, then h(p) is a product node,

  3. if p is union, then h(p) is a union node,

  4. if p is closed, then the outgoing labels of h(p) are exactly the outgoing labels of p,

  5. if p is open, then the outgoing labels of h(p) contain at least the outgoing labels of p,

  6. for every pattern edge

    p --l--> q

    there is a corresponding type edge

    h(p) --l--> h(q).

Condition 6 is the crucial one: if two edges of P point to the same node q, then the corresponding edges of T must point to the same type node h(q). This is how the graph structure of the pattern expresses “the same type”.

We define:

Types(P) = { T : T ⊨ P }.

Proposition 0

If T ⊨ P, then:

Proof

All three statements are immediate from the definition of the root-preserving morphism h and conditions 1-5 above, applied at the root node of P.

7. Singleton Patterns And Types

Every type graph T determines a closed pattern Pat(T) obtained by forgetting that it is a type and viewing it as a closed pattern graph.

This gives the following proposition.

Proposition 1

For every type T, the pattern Pat(T) is singleton:

Types(Pat(T)) = { T }

up to the chosen notion of type equivalence.

Proof

The identity map from nodes(Pat(T)) to nodes(T) witnesses T ⊨ Pat(T). Conversely, suppose U ⊨ Pat(T) via a root-preserving morphism h. Because every node of Pat(T) is either product or union, every node kind is preserved. Because every node is closed, the outgoing labels at each image node are exactly the outgoing labels of the corresponding node of T. Because every pattern edge must be respected by h, all target-sharing constraints are preserved as well. Therefore h is a graph isomorphism modulo the chosen equivalence on type graphs, and hence U is equivalent to T.

Proposition 2

Conversely, every singleton pattern determines a unique type up to type equivalence.

Proof

Let P be singleton. By definition, Types(P) contains exactly one type up to equivalence. Choose any representative T ∈ Types(P). Then T is the unique type determined by P, again up to the chosen equivalence relation.

Thus types can be regarded as a special case of patterns.

8. Polymorphic Values

A polymorphic value is a pair

(P, v)

where:

Equivalently:

PolyVal(P) = { (P, (T, t)) : T ⊨ P and t ∈ Tree(T) }.

So a pattern does not denote a set of raw trees. It denotes a set of typed values:

Val(P) = { (T, t) : T ⊨ P and t ∈ Tree(T) }.

This is a disjoint union over matching types, not an ordinary union of raw tree sets.

Proposition 3

For every type T,

Val(Pat(T)) = Val(T).

Proof

By Proposition 1, Types(Pat(T)) = { T } up to equivalence. Substituting this into the definition of Val(P) yields exactly the set of typed values whose type is T.

9. Examples

9.1. Distinct pattern graphs

Consider:

P1 = { (...) x, (...) y, ... }
P2 = { X x, X y, ... }

Then P1 and P2 are different patterns.

Hence:

Types(P2) ⊊ Types(P1).

Proof sketch

Every type matching P2 also matches P1, because identifying the target types of x and y is stronger than leaving them unrelated. The inclusion is strict because a product type with distinct types at x and y matches P1 but not P2.

9.2. Same tree, different typed values

Let t = {}|x. Suppose t is accepted both by

T1 = < {} x >
T2 = < {} x, {} y >

Then (T1, t) and (T2, t) are different typed values.

The raw tree is the same, but the type is different.

Proposition 4

If T ≠ U, then:

Val(T) ∩ Val(U) = ∅.

Proof

An element of Val(T) has the form (T, t). An element of Val(U) has the form (U, u). If (T, t) = (U, u), then necessarily T = U. Hence, when T ≠ U, the two sets are disjoint.

10. Consequences For Equality And Compression

Since values are intrinsically typed, raw tree equality is not sufficient for semantic equality.

This has an immediate consequence for DAG compression. Suppose we have the raw tree

{{}|x a, {}|x b, {}|x c}

and consider two patterns:

P1 = { (...) a, (...) b, (...) c, ... }
P2 = { X a, X b, X c, ... }

If only the raw tree is considered, the three occurrences of {}|x appear identical and a naive compressor may merge them. But semantically this is not always justified.

Therefore canonical value sharing, when used as an optimization, must be relative to the typing information supplied by the pattern and the witness type. One must not collapse raw subtrees merely because they look the same as trees.

Proposition 5

There is, in general, no canonical DAG compression procedure that depends only on the raw value tree and is correct for all polymorphic values.

Proof

Consider the raw tree

{{}|x a, {}|x b, {}|x c}.

As a raw tree, the three subtrees under a, b, and c are isomorphic. Hence any tree-only compressor cannot distinguish their occurrences by local tree structure alone. Now compare the two valid pattern contexts:

P1 = { (...) a, (...) b, (...) c, ... }
P2 = { X a, X b, X c, ... }.

Under P1, the three occurrences are not forced to have the same type. Under P2, they are forced to have the same type because all three edges land in the same pattern node. So a compressor that always merges these subtrees is unsound for P1, while a compressor that never merges them fails to capture the canonical sharing justified by P2. Therefore no canonical sharing rule can be defined from the raw tree alone; typing information from the pattern and witness type is required.

Proposition 6

For a fixed polymorphic value (P, (T, t)), any canonical DAG compression of t must be invariant under equality of typed occurrences, not merely equality of raw subtrees.

Proof sketch

By Proposition 4, typed values of different types are distinct even when represented by the same raw tree. Within a fixed witness type T, the pattern graph still matters because distinct positions of P may or may not be identified. Therefore the only semantically defensible sharing criterion is one that respects the typing/decorated occurrence information carried by (P, T), not the undecorated tree t alone.

11. Topological Remark

There is a topology-like intuition behind patterns, but only for a fragment.

If one looks only at open positive patterns, they behave like observable structural properties of types and suggest a topology or, more precisely, an Alexandrov-style structure on the class of types ordered by refinement.

However, the full pattern formalism is richer than topology because it includes:

These two features go beyond ordinary open-set semantics.

Proposition 7

The class of patterns is strictly richer than the open positive fragment.

Proof sketch

The open positive fragment can describe only monotone structural requirements such as “at least these labels” and “these positions have these subpatterns”. It cannot express exact closedness, because closedness is not monotone under refinement. It also cannot express identification of two positions as “the same type” without preserving explicit graph sharing. Since full patterns include both closedness and shared-node identity, they strictly extend the open positive fragment.

12. Summary

13. Corollaries For Binary Serialization

The previous propositions impose strong constraints on any correct binary format.

Corollary 1

The semantic payload of a serialized polymorphic value must determine:

with T ⊨ P.

It is not sufficient to serialize only a raw tree.

Justification

By Proposition 4, type is part of value identity. By Proposition 5, the raw tree alone is insufficient to determine correct canonical sharing behavior.

Corollary 2

A closed monomorphic typed value is the special case where the serialized pattern is singleton.

Justification

This is Proposition 1 together with Proposition 3.

Corollary 3

Any canonical DAG representation of the witness value must be computed from the decorated object (P, T, t), not from t alone.

Justification

This is exactly the content of Propositions 5 and 6.

Corollary 4

Pattern serialization must preserve explicit pattern-node identity.

Justification

Pattern node sharing expresses sameness of type at different positions. Replacing a pattern graph by any representation that forgets this identity changes its denotation.

Corollary 5

For singleton patterns, one may replace the explicit serialized pattern graph by a canonical type reference as a compact optimization, provided that the decoding semantics is unchanged.

Justification

By Proposition 2, a singleton pattern determines a unique type up to equivalence. Therefore the explicit graph may be recoverable from a canonical type identifier.