# stream-json

> Micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. Parse JSON files far exceeding available memory using a SAX-inspired streaming token API. One dependency: `stream-chain`.

## Install

npm i stream-json

## Quick start

```js
import chain from 'stream-chain';
import {parser} from 'stream-json';
import {streamArray} from 'stream-json/streamers/stream-array.js';
import fs from 'node:fs';

const pipeline = chain([
  fs.createReadStream('data.json'),
  parser(),
  streamArray(),
  ({value}) => console.log(value)
]);
```

## API

### Parser

`parser(options)` — streaming JSON parser producing `{name, value}` tokens.

- Returns a function for use in `chain()`. Call `parser.asStream(options)` for a Node Duplex stream or `parser.asWebStream(options)` for a Web `{readable, writable}` pair.
- Options: `packKeys`, `packStrings`, `packNumbers` (default: true), `streamKeys`, `streamStrings`, `streamNumbers` (default: true), `jsonStreaming` (default: false).
- `packValues`/`streamValues` — shortcut to set all three at once.
- `parser` is `gen(fixUtf8Stream(), jsonParser())`; the named export `jsonParser` is the bare tokenizer (no UTF-8 front) for advanced/embedding use. The same gen/raw split applies to `jsoncParser`, `jsonlParser`, and the verifiers (`jsonVerifier`, `jsoncVerifier`).

```js
import {parser} from 'stream-json';
const pipeline = fs.createReadStream('data.json').pipe(parser.asStream());

// Web Streams substrate:
import {parser} from 'stream-json/web/parser.js';
const {readable, writable} = parser.asWebStream();
```

Every substrate-bearing component has both `stream-json/X.js` (Node + Web shapes) and `stream-json/web/X.js` (Web-only, browser-safe) entries.

### Main module

The default export is `parserStream` — an alias for `parser.asStream()` that returns a parser as a Duplex stream:

```js
import parserStream from 'stream-json';
const stream = parserStream();
fs.createReadStream('data.json').pipe(stream);
```

For the SAX-style event API on Node (`stream.on('startObject', ...)`), wrap with `emit()`:

```js
import parserStream from 'stream-json';
import emit from 'stream-json/utils/emit.js';
const stream = emit(parserStream());
stream.on('startObject', () => { /* ... */ });
```

For the SAX-style event API on Web, use the `EventTarget`-based variants from `stream-json/web/emitter.js` or `stream-json/web/utils/emit.js`; subscribe with `addEventListener(name, ev => ev.detail)`. For hot paths on either substrate, prefer `for await (const tok of readable) handlers[tok.name]?.(tok.value)` — zero per-token allocation.

### Assembler

`Assembler` — class that reconstructs JS objects from tokens. Receives a per-value callback via the `onDone` option.

```js
import Assembler from 'stream-json/assembler.js';
const asm = Assembler.connectTo(parserStream, {onDone: asm => console.log(asm.current)});
```

- `asm.tapChain` — function for use in `chain()`.
- `asm.onDone(fn)` — set/clear the callback after construction.
- Options: `reviver`, `numberAsString`, `onDone`.

### Disassembler

`disassembler(options)` — JS objects → token stream (generator). Has `asStream` (Node Duplex) and `asWebStream` (Web pair). Web-only entry: `stream-json/web/disassembler.js`.

```js
import {disassembler} from 'stream-json/disassembler.js';
chain([objectSource, disassembler(), stringer(), destination]);
```

### Stringer

`Stringer` — Transform stream converting tokens back to JSON text. Has `asStream` (Node Duplex) and `asWebStream` (Web pair). Web-only entry: `stream-json/web/stringer.js`.

```js
import {stringer} from 'stream-json/stringer.js';
chain([parser(), pick({filter: 'data'}), stringer(), destination]);
```

### Emitter

`Emitter` — sink that re-emits tokens as named events. Node version is a Writable (EventEmitter); Web version is an EventTarget with `.writable` `WritableStream` attached.

```js
// Node
import emitter from 'stream-json/emitter.js';
const e = emitter();
e.on('startObject', () => { /* ... */ });

// Web
import emitter from 'stream-json/web/emitter.js';
const e = emitter();
e.addEventListener('startObject', () => { /* ... */ });
e.addEventListener('keyValue', ev => console.log(ev.detail));
// pipe a token-producing readable into e.writable
```

`Assembler.connectTo(stream, options)` (and `FlexAssembler.connectTo`) is substrate-aware — accepts either a Node Readable or a Web ReadableStream. For hot paths, prefer `for await (const tok of readable) asm.consume(tok)` over `connectTo` — no async-closure overhead, errors propagate directly.

## Filters

All filters accept `{filter, pathSeparator, once, streamKeys}` options. `filter` can be a string, RegExp, or `(stack, chunk) => boolean`.

- **`pick(options)`** — passes only matching subobjects, discards the rest.
- **`replace(options)`** — replaces matching subobjects. Extra option: `replacement` (function, value, or array of tokens).
- **`ignore(options)`** — removes matching subobjects completely.
- **`filter(options)`** — keeps matching subobjects preserving surrounding structure.

Each ships in both substrates with `asStream`, `asWebStream`, `withParser`, `withParserAsStream`, and `withParserAsWebStream` attached. Web-only entries: `stream-json/web/filters/<name>.js`.

```js
import {pick} from 'stream-json/filters/pick.js';
import {ignore} from 'stream-json/filters/ignore.js';
import {streamValues} from 'stream-json/streamers/stream-values.js';

chain([
  parser(),
  pick({filter: 'data'}),
  ignore({filter: /\b_meta\b/i}),
  streamValues(),
  ({value}) => process(value)
]);
```

## Streamers

Assemble complete JS objects from a token stream. All produce `{key, value}` objects, generic in the assembled value type (`streamArray<T>()`, `streamValues<T>()`, `streamObject<T>()`; `value` defaults to `unknown`).

- **`streamValues(options)`** — streams successive JSON values. Use with `jsonStreaming` or after `pick`.
- **`streamArray(options)`** — streams elements of a single top-level array.
- **`streamObject(options)`** — streams properties of a single top-level object.

All support `objectFilter` for early rejection of objects during assembly. Each ships in both substrates with `asStream`, `asWebStream`, `withParser`, `withParserAsStream`, and `withParserAsWebStream` attached. Web-only entries: `stream-json/web/streamers/<name>.js`.

```js
import {streamArray} from 'stream-json/streamers/stream-array.js';
chain([parser(), streamArray(), ({key, value}) => console.log(key, value)]);
```

## Utilities

- **`emit(stream)`** — attach token events to a Node Readable. Web variant (`stream-json/web/utils/emit.js`) takes a `ReadableStream` and returns an auto-piped `EventTarget`. Zero-allocation alternative: `for await (const tok of readable) handlers[tok.name]?.(tok.value)`.
- **`withParser(fn, options)`** — create `gen(parser(options), fn(options))` pipeline. Most components export `.withParser()` and `.withParserAsStream()`.
- **`FlexAssembler`** — Assembler with custom containers (Map, Set, etc.) at specific paths. Rules: `{filter, create, add, finalize?}`. Separate `objectRules` and `arrayRules`.
- **`Batch`** — Transform stream batching items into arrays. Option: `batchSize` (default: 1000). Both `asStream` and `asWebStream` attach `_batchSize` to the returned pair/stream.
- **`Verifier`** — validates JSON text (`charCodeAt` validator), reports exact error position. Has `asStream` and `asWebStream`. Named export `jsonVerifier` is the bare validator (no UTF-8 front).

### withParser shortcut

```js
import {withParser} from 'stream-json/streamers/stream-array.js';
const pipeline = withParser();
fs.createReadStream('data.json').pipe(pipeline);
```

## JSONL support

> **Deprecated — slated for removal in a future major.** stream-json's JSONL parser and stringer are thin re-exports of stream-chain's (`stream-chain/jsonl/parser.js`, `stream-chain/jsonl/stringerStream.js`). Use stream-chain's JSONL directly. stream-json is a JSON *token* library; JSONL yields whole objects per line and belongs in stream-chain with the other substrate components.

- **`jsonl/parser(options)`** — JSONL parser producing `{key, value}` objects. Options: `reviver`, `errorIndicator`. Has `asStream` and `asWebStream`. Web entry: `stream-json/web/jsonl/parser.js`.
- **`jsonl/stringer(options)`** — objects → JSONL text. Options: `replacer`, `space`, `separator`. Node entry is itself a `Transform`; `jsonlStringer.asWebStream` returns a Web `TransformStream<T, string>`. Web entry (`stream-json/web/jsonl/stringer.js`) returns the `TransformStream` directly.

```js
import {parser} from 'stream-json/jsonl/parser.js';
import {stringer} from 'stream-json/jsonl/stringer.js';

chain([fs.createReadStream('data.jsonl'), parser(), ({value}) => transform(value), stringer(), destination]);
```

## JSONC support

- **`jsonc/parser(options)`** — JSONC parser (JSON with Comments). Same `charCodeAt` tokenizer as the standard parser, extended with `//` and `/* */` comments, trailing commas, and `whitespace` / `comment` / `comma` tokens.
  - Extra options: `streamWhitespace` (default: true), `streamComments` (default: true), `streamCommas` (default: false — emit a valueless `comma` token at every comma, separator or trailing, for faithful round-trip editing).
  - All standard parser options are supported.
- **`jsonc/stringer(options)`** — JSONC stringer. Passes `whitespace` and `comment` tokens through verbatim. Extra option: `useCommas` (default: false — render streamed `comma` tokens as `,`, auto-inserting a separator only when no comma token arrived, so output stays valid even if commas were dropped upstream).
- **`jsonc/verifier(options)`** — JSONC validator. Same `charCodeAt` validator as `Verifier`, accepting comments and trailing commas. Reports exact error position.

```js
import {parser as jsoncParser} from 'stream-json/jsonc/parser.js';
import {stringer as jsoncStringer} from 'stream-json/jsonc/stringer.js';

chain([fs.createReadStream('settings.jsonc'), jsoncParser(), jsoncStringer(), destination]);
```

All existing filters, streamers, and utilities work with JSONC parser output — they ignore unknown tokens.

JSONC also ships in both substrates: each has `asStream` (Node Duplex) and `asWebStream` (Web pair). Web entries: `stream-json/web/jsonc/{parser,stringer,verifier}.js`.

## File I/O (Node-only) — Since 3.3.0

- **`file/parseFile(options)`** — input-edge `gen()` stage that turns a path into a token stream. Returns `gen(asyncBlockReader(options), jsonParser(options))`. Drop at the head of a pipeline; drive with the path as the gen input value. Options: `readBlockSize` (default 64 KB) + all standard `parser()` options. Node-only (uses `node:fs/promises`).
- **`file/verifyFile(path, options)`** — standalone async JSON validator. Returns `Promise<void>`. Rejects with `{message, line, pos, offset}` on invalid input. Options: `readBlockSize` + `jsonStreaming`.
- **`file/stringerToFile(path, options)`** — output-edge sink stage. Returns `gen(stringer(options), asyncBlockWriter(path, options))`. Drop at the tail; pipe MUST be driven through `pipe(...)` so the writer's flush closes the file. Options: `writeBlockSize` (default 1 MB) + all standard `stringer()` options.
- **`core/utils/pipe(...stages)`** — one-shot single-value driver: builds a fresh `gen`, calls `g(value)` then `g(none)` so flushable sinks like `stringerToFile` actually flush. Generic, web-safe.
- **`core/utils/drain(asyncGen)`** — drains any async iterable, returns the last yielded value (or `undefined`). Generic, web-safe.
- JSONC variants under `file/jsonc/{parser,verifier,stringer}.js` — same shapes, comments + trailing commas supported.

```js
import {parseFile} from 'stream-json/file/parser.js';
import {stringerToFile} from 'stream-json/file/stringer.js';
import {pipe} from 'stream-json/utils/pipe.js';
import {drain} from 'stream-json/utils/drain.js';

// file → tokens → file (round-trip)
await drain(pipe(parseFile(), stringerToFile('out.json'))('in.json'));

// validate a file
import {verifyFile} from 'stream-json/file/verifier.js';
await verifyFile('candidate.json'); // throws {message, line, pos, offset} on invalid
```

Perf (Intel i3‑10110U, Node 26, 100 KB JSON):
- Realistic parse-with-work (counter inside the pipeline; `bench/parse-count.js`): `pipe(parseFile(), counter)` ≈ 9.4 ms vs idiomatic `chain([createReadStream, parser()]) + on('data', counter)` ≈ 15.8 ms — **~68% faster**. gen() and chain() executors are within noise of each other.
- Round-trip (`bench/file-roundtrip.js`): `pipe(parseFile(), stringerToFile())` ≈ 30 ms vs idiomatic chain + `createWriteStream` ≈ 50 ms — **~1.6× faster**.
- Verify: within noise of the idiomatic chain.

## Common patterns

### Stream a huge JSON array

```js
chain([
  fs.createReadStream('huge-array.json'),
  parser(),
  streamArray(),
  ({value}) => processItem(value)
]);
```

### Pick and filter nested data

```js
chain([
  fs.createReadStream('data.json'),
  parser(),
  pick({filter: 'results'}),
  streamArray(),
  ({value}) => value.active ? value : null
]);
```

### Edit JSON and write back

```js
chain([
  fs.createReadStream('input.json'),
  parser(),
  ignore({filter: /\bsecret\b/}),
  Stringer.make(),
  fs.createWriteStream('output.json')
]);
```

## Token protocol

The parser emits `{name, value}` tokens: `startObject`, `endObject`, `startArray`, `endArray`, `startKey`, `endKey`, `keyValue`, `startString`, `endString`, `stringChunk`, `stringValue`, `startNumber`, `endNumber`, `numberChunk`, `numberValue`, `nullValue`, `trueValue`, `falseValue`.

These names are the closed `TokenName` type; `Token` is a discriminated union over `name` (narrowing on `token.name` tightens `token.value`). Both are exported from `stream-json/parser.js`.

## Links

- Docs: https://github.com/uhop/stream-json/wiki
- npm: https://www.npmjs.com/package/stream-json
- Full LLM reference: https://github.com/uhop/stream-json/blob/master/llms-full.txt
