nearley.js2.10.4

Tokenizers

By default, nearley splits the input into a stream of characters. This is called scannerless parsing.

A tokenizer splits the input into a stream of larger units called tokens. This happens in a separate stage before parsing. For example, a tokenizer might convert 512 + 10 into ["512", "+", "10"]: notice how it removed the whitespace, and combined multi-digit numbers into a single number.

Using a tokenizer has many benefits. It…

Lexing with Moo

nearley supports and recommends Moo, a super-fast tokenizer. Here is a simple example:

@{%
const moo = require("moo");

const lexer = moo.compile({
  ws:     /[ \t]+/,
  number: /[0-9]+/,
  times:  /\*|x/
});
%}

# Pass your lexer object using the @lexer option:
@lexer lexer

# Use %token to match any token of that type instead of "token":
multiplication -> %number %ws %times %ws %number {% ([first, , , , second]) => first * second %}

Have a look at the Moo documentation to learn more about the tokenizer.

Note that when using a tokenizer, raw strings match full tokens parsed by Moo. This is convenient for matching keywords.

ifStatement -> "if" condition "then" block

You use the parser as usual: call parser.feed(data), and nearley will give you the parsed results in return.

Custom lexers

nearley recommends using a moo-based lexer. However, you can use any lexer that conforms to the following interface:

Note: if you are searching for a lexer that allows indentation-aware grammars (like in Python), you can still use moo. See this example

Custom token matchers

Aside from the lexer infrastructure, nearley provides a lightweight way to parse arbitrary streams.

Custom matchers can be defined in two ways: literal tokens and testable tokens. A literal token matches a JS value exactly (with ===), while a testable token runs a predicate that tests whether or not the value matches.

Note that in this case, you would feed a Parser instance an array of objects rather than a string! Here is a simple example:

@{%
const tokenPrint = { literal: "print" };
const tokenNumber = { test: x => Number.isInteger(x) };
%}

main -> %tokenPrint %tokenNumber ";;"

# parser.feed(["print", 12, ";;"]);