# stream-csv-as-json

> A micro-library of Node.js stream components for creating custom CSV processing pipelines with a minimal memory footprint. It can parse CSV files far exceeding available memory streaming individual primitives using a SAX-inspired API. One runtime dependency: `stream-json`. Companion project to `stream-json` — uses the same token protocol, integrates with its filters, streamers, and utilities. Works with `stream-chain` for pipeline composition.

- Streaming SAX-inspired CSV parser producing `{name, value}` tokens
- Parse CSV files far exceeding available memory
- Individual field values can be streamed piece-wise
- asObjects converts header + data rows into object token streams
- stringer converts token streams back to CSV text
- Token protocol compatible with `stream-json` — use its filters, streamers, and utilities downstream
- Proper backpressure handling via `stream-chain` (flushable functions, `gen()`, `asStream()`)
- Components are factory functions returning flushable closures
- TypeScript declarations included

## Quick start

Install:

```bash
npm i stream-csv-as-json
```

Stream a huge CSV file as token objects:

```js
import fs from 'node:fs';
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';
import asObjects from 'stream-csv-as-json/as-objects.js';

const pipeline = chain([
  fs.createReadStream('data.csv'),
  parser(),
  asObjects()
]);

pipeline.on('data', token => console.log(token));
pipeline.on('end', () => console.log('done'));
```

## Importing

```js
// Main API (parser + emit)
import make from 'stream-csv-as-json';
import {parser} from 'stream-csv-as-json';

// Parser
import parser from 'stream-csv-as-json/parser.js';

// asObjects
import asObjects from 'stream-csv-as-json/as-objects.js';

// stringer
import stringer from 'stream-csv-as-json/stringer.js';

// with-parser utility
import withParser from 'stream-csv-as-json/utils/with-parser.js';
```

## Token protocol

The parser emits `{name, value}` tokens. All downstream components operate on this protocol. The tokens are compatible with `stream-json`.

| Token name    | Value  | Meaning                    |
| ------------- | ------ | -------------------------- |
| `startArray`  | —      | Start of a CSV row         |
| `endArray`    | —      | End of a CSV row           |
| `startString` | —      | Start of a field value     |
| `endString`   | —      | End of a field value       |
| `stringChunk` | string | Piece of a field value     |
| `stringValue` | string | Packed complete field value |

By default, the parser emits both streamed tokens (`startString`/`stringChunk`/`endString`) and packed tokens (`stringValue`). This is controlled by options.

After `asObjects`, additional tokens appear:

| Token name    | Value  | Meaning                      |
| ------------- | ------ | ---------------------------- |
| `startObject` | —      | Start of a data row (object) |
| `endObject`   | —      | End of a data row (object)   |
| `startKey`    | —      | Start of field name          |
| `endKey`      | —      | End of field name            |
| `keyValue`    | string | Packed field name            |

## Main module

The default export creates a Parser Duplex stream with `emit()` applied (from `stream-json/utils/emit`):

```js
import make from 'stream-csv-as-json';

const stream = make();
stream.on('startArray', () => { /* row start */ });
stream.on('stringValue', val => { /* field value */ });
stream.on('endArray', () => { /* row end */ });
```

Named export: `parser` (the factory function).

## Parser API

`parser(options)` — factory returning a flushable function that consumes CSV text and produces `{name, value}` tokens. Composed with `fixUtf8Stream()` via `gen()`.

- `parser(options)` — returns a flushable function for use in `chain()`.
- `parser.asStream(options)` — returns a Duplex stream (writableObjectMode: false, readableObjectMode: true).
- `parser.parser` — self-reference for destructuring.

Options:

- `packStrings` (boolean, default: true) — emit `stringValue` tokens with the complete field value.
- `packValues` (boolean) — alias for `packStrings`.
- `streamStrings` (boolean, default: true) — emit `startString`/`stringChunk`/`endString` tokens.
- `streamValues` (boolean) — alias for `streamStrings`.
- `separator` (string, default: `','`) — field separator character.

If `packStrings` is false, `streamStrings` is forced to true (at least one representation must be emitted).

CSV parsing:
- Handles quoted fields per RFC 4180: double-quote escaping (`""` → `"`), embedded separators, embedded newlines.
- Row terminator acceptance is lenient — CRLF (RFC 4180), LF, and bare CR all work.
- A leading UTF-8 BOM (`U+FEFF`) at the start of the input is stripped.
- Each row is represented as an array: `startArray`, field tokens, `endArray`.
- Uses sticky RegExp (`/y` flag) for performance; each parser instance owns its own pattern set.

Errors:
- `"Parser cannot parse input: expected a quoted value"` — input ends mid-quote (unterminated quoted field).
- `"Parser cannot parse input: unexpected character after a quoted value"` — content other than the separator, CR, LF, or another `"` appears immediately after a closing `"`.

Errors propagate as the stream's `'error'` event.

```js
import parser from 'stream-csv-as-json/parser.js';
import fs from 'node:fs';

fs.createReadStream('data.csv')
  .pipe(parser.asStream())
  .on('data', token => console.log(token.name, token.value))
  .on('end', () => console.log('done'));
```

With `stream-chain`:

```js
import fs from 'node:fs';
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';

const pipeline = chain([
  fs.createReadStream('data.csv'),
  parser(),
  token => { console.log(token.name, token.value); return null; }
]);
```

## AsObjects API

`asObjects(options)` — factory returning a flushable function that uses the first CSV row as field names and converts subsequent rows from array tokens to object tokens.

- `asObjects(options)` — returns a flushable function for use in `chain()`.
- `asObjects.asStream(options)` — returns a Duplex stream (objectMode both sides).
- `asObjects.asObjects` — self-reference for destructuring.
- `asObjects.withParser(options)` — creates a pipeline of CSV parser + asObjects via `gen()`.
- `asObjects.withParserAsStream(options)` — same, wrapped as a Duplex stream.

Options:

- `packKeys` (boolean, default: true) — emit `keyValue` tokens with the field name.
- `packValues` (boolean) — alias for `packKeys`.
- `streamKeys` (boolean, default: true) — emit `startKey`/`stringChunk`/`endKey` tokens for field names.
- `streamValues` (boolean) — alias for `streamKeys`.
- `fieldPrefix` (string, default: `'field'`) — prefix for unnamed fields when data has more columns than headers, or when a header cell is empty. Field name becomes `fieldPrefix + index`.
- `useStringValues` / `useValues` — deprecated no-ops kept for backward compatibility (see Header auto-detection below).

If `packKeys` is false, `streamKeys` is forced to true.

Behavior:
1. **Header phase**: Consumes the first row to build the field-name list. Auto-detects the parser's mode — captures from `startString` / `stringChunk` / `endString` when stream tokens are emitted, or from `stringValue` when only packed values are emitted. Works with every parser configuration without an explicit option.
2. **Data phase**: Converts subsequent rows:
   - `startArray` → `startObject`
   - Before each field value: emits key tokens (`startKey`/`stringChunk`/`endKey` and/or `keyValue`)
   - `endArray` → `endObject`

```js
import fs from 'node:fs';
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';
import asObjects from 'stream-csv-as-json/as-objects.js';

const pipeline = chain([
  fs.createReadStream('data.csv'),
  parser(),
  asObjects()
]);

pipeline.on('data', token => console.log(token));
```

With `withParser` shortcut:

```js
import fs from 'node:fs';
import asObjects from 'stream-csv-as-json/as-objects.js';

const pipeline = chain([
  fs.createReadStream('data.csv'),
  asObjects.withParser()
]);

pipeline.on('data', token => console.log(token));
```

Custom field prefix:

```js
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';
import asObjects from 'stream-csv-as-json/as-objects.js';

// If headers are: name,age and data has 4 columns:
// Fields become: name, age, field2, field3
const pipeline = chain([
  parser(),
  asObjects({fieldPrefix: 'col'}),
  // Fields become: name, age, col2, col3
]);
```

## Stringer API

`stringer(options)` — factory returning a flushable function that converts a CSV token stream back to CSV text.

- `stringer(options)` — returns a flushable function for use in `chain()`.
- `stringer.asStream(options)` — returns a Duplex stream (writableObjectMode: true, readableObjectMode: false).
- `stringer.stringer` — self-reference for destructuring.

Options:

- `useStringValues` (boolean, default: false) — use `stringValue` tokens instead of `startString`/`stringChunk`/`endString`.
- `useValues` (boolean) — alias for `useStringValues`.
- `separator` (string, default: `','`) — field separator character.
- `rowTerminator` (string, default: `'\r\n'`) — row terminator string. CRLF per RFC 4180; override with `'\n'` for Unix-style output.

Two modes:

1. **Stream mode** (default, `useStringValues: false`): Consumes `startString`/`stringChunk`/`endString` tokens. Always quotes fields (wraps in `"`), escapes `"` as `""`.
2. **Value mode** (`useStringValues: true`): Consumes `stringValue` tokens. Quotes only when necessary (field contains separator, `"`, `\r`, or `\n`).

Rows are terminated with the configured `rowTerminator` (default `\r\n` per RFC 4180).

```js
import fs from 'node:fs';
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';
import stringer from 'stream-csv-as-json/stringer.js';

// Round-trip: parse CSV and write it back
chain([
  fs.createReadStream('input.csv'),
  parser(),
  stringer(),
  fs.createWriteStream('output.csv')
]);

// Value mode: smarter quoting
chain([
  parser({packValues: true, streamValues: false}),
  stringer({useStringValues: true}),
  fs.createWriteStream('output.csv')
]);
```

## Common patterns

### Stream a huge CSV as token objects

```js
import fs from 'node:fs';
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';
import asObjects from 'stream-csv-as-json/as-objects.js';

const pipeline = chain([
  fs.createReadStream('huge.csv'),
  parser(),
  asObjects()
]);
pipeline.on('data', token => {
  if (token.name === 'endObject') processRow();
});
pipeline.on('end', () => console.log('done'));
```

### Compressed CSV processing

```js
import fs from 'node:fs';
import zlib from 'node:zlib';
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';
import asObjects from 'stream-csv-as-json/as-objects.js';

chain([
  fs.createReadStream('data.csv.gz'),
  zlib.createGunzip(),
  parser(),
  asObjects()
]);
```

### CSV round-trip (parse and re-emit)

```js
import fs from 'node:fs';
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';
import stringer from 'stream-csv-as-json/stringer.js';

chain([
  fs.createReadStream('input.csv'),
  parser(),
  stringer(),
  fs.createWriteStream('output.csv')
]);
```

### Custom separator (TSV)

```js
import fs from 'node:fs';
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';
import asObjects from 'stream-csv-as-json/as-objects.js';

chain([
  fs.createReadStream('data.tsv'),
  parser({separator: '\t'}),
  asObjects()
]);
```

### Event-based processing

```js
import fs from 'node:fs';
import make from 'stream-csv-as-json';

const stream = make();
let rowCount = 0;
stream.on('startArray', () => ++rowCount);
stream.on('end', () => console.log(`${rowCount} rows`));
fs.createReadStream('data.csv').pipe(stream);
```

## Links

- Docs: https://github.com/uhop/stream-csv-as-json/wiki
- npm: https://www.npmjs.com/package/stream-csv-as-json
- Repository: https://github.com/uhop/stream-csv-as-json
