Hello World

Do some super basic parsing. In this example we'll parse things of the form:

S -> b C d.
C -> c C | [epsilon].

This should match strings of the form "bcccccccccd" with any number of "c"'s in a row (at least one).

Creating a parser description

The first step to creating any great parser is defining the underlying grammar. In Js-parse we use a simple javascript object:

var parserDescription = {
	"symbols":{
		"b": { "terminal":true, "match":"b" },
		"c": { "terminal":true, "match":"c" },
		"d": { "terminal":true, "match":"d" }
	},
	"productions":{
		"S":[
			[ "b", "C", "d" ]
		],
		"C":[
			[ "c", "C" ],
			[ "c" ]
		]
	},
	"startSymbols": [ "S" ]
};

Theres quite a bit to disect there, but comparing against the original BNF form of the grammar, the correspodnance should be pretty clear.
There are three major components here:

  1. symbols: This defines all of the essential symbols the parser should expect to encounter. Note that non-terminal symbols do not need to be included here but they can be.
  2. productions: This is where the core of the grammar is described, each key in the productions object corresponds to a non-terminal symbol. Each array contained in the value represents a production for that symbol. Other symbols are represented by strings and can represent terminals or non-terminals. Also note, epsilon is represented by an empty array.
  3. startSymbols: This ones pretty self-explanatory. Provide an array of potential starting symbols, the grammar will only be accepted if the entire input can be fit into one of these start symbols.

Creating the parser

In this step we create a parser out of the parser description object we created above.

Be sure to have included the parser object definitions using something like var Parser = require("../../lib/index").Parser.LRParser;.

Create the parser:
// Create the parser
var parser = Parser.CreateWithLexer(parserDescription);

Next we want to do is define some handlers so we can find out whats going on while we're parsing and be alerted if there are any errors.
// Bind some event handlers
parser.on("accept", function(token_stack){
	console.log("Parser Accept:", require('util').inspect(token_stack, true, 1000));
});

parser.on("error", function(error){
	console.log("Parse Error: ", error.message);
	throw error.message;
});

Do some parsing!

Pass an input string to the parser, tell it you've reached the end of the stream and see the magic happen.

We want to send some input to the parser but also let it know that we've reached the end of the string. Once it is told the end of the string was reached it will decide whether or not to accept the entire input according to the grammar.
// Begin processing the input
var input = "bccccccd";
parser.append(input);
parser.end();
If all went according to plan you should see some output like this:
Parser Accept: [ { head: 'S',
    body: 
     [ { type: 'b', value: 'b' },
       { head: 'C',
         body: 
          [ { type: 'c', value: 'c' },
            { head: 'C',
              body: 
               [ { type: 'c', value: 'c' },
                 { head: 'C',
                   body: 
                    [ { type: 'c', value: 'c' },
                      { head: 'C',
                        body: 
                         [ { type: 'c', value: 'c' },
                           { head: 'C',
                             body: 
                              [ { type: 'c', value: 'c' },
                                { head: 'C', body: [ { type: 'c', value: 'c' }, [length]: 1 ] },
                                [length]: 2 ] },
                           [length]: 2 ] },
                      [length]: 2 ] },
                 [length]: 2 ] },
            [length]: 2 ] },
       { type: 'd', value: 'd' },
       [length]: 3 ] },
  [length]: 1 ]
Its far from the most beautiful output we could ask for. If you look closely though, you can see the expected structure represented here.

Now lets look quickly at a string which does not conform to our grammar, say bcde:
Unrecognized characters at end of stream: 'e'
You can see that an exception was thrown upon calling parser.end() because additional characters were detected at the end of the string which did not conform to the grammar.

That's all you really need to get going, keep reading for some more advanced usage tips to clean up the output a little bit.

Advanced Tip: Using mergeRecursive to clean up the output.

Sometimes theres no need to create such a deeply nested structure like {"C": {"C": {"C": ... }}}. For this purpose, mergeRecursive was born.

Alter the symbols portion of our parser description to include the line: "C": { "terminal":false, "mergeRecursive":true }.

Now run the script with bccccccd again and you should see a slightly different output:
Parser Accept: [ { head: 'S',
    body: 
     [ { type: 'b', value: 'b' },
       { head: 'C',
         body: 
          [ { type: 'c', value: 'c' },
            { type: 'c', value: 'c' },
            { type: 'c', value: 'c' },
            { type: 'c', value: 'c' },
            { type: 'c', value: 'c' },
            { type: 'c', value: 'c' },
            [length]: 6 ] },
       { type: 'd', value: 'd' },
       [length]: 3 ] },
  [length]: 1 ]
As you can see, all of those nested C productions have been merged into one instance of a C production and the value being and array of all the values that existed under a chain of nested C type productions.

Note: mergeRecursive will only merge elements if the parent and child are of the same type.

Advanced Tip: Using mergeIntoParent to clean up the output in a different way.

mergeIntoParent is used when the production is important to the parsing process, but not of any semantic relevance. It causes the production of that symbol to be merged directly into the parent instead of appearing as a child of the given production

If we changed the C line in our symbols object to "C": { "terminal":false, "mergeIntoParent":true } and ran our script again, we see different output:
[ { head: 'S',
    body: 
     [ { type: 'b', value: 'b' },
       { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       { type: 'd', value: 'd' },
       [length]: 8 ] },
  [length]: 1 ]
This happens because each time a "c" is encountered it is added to the "S" production rather than creating a new "C" and adding it to that.

Advanced Tip: Using excludeFromParent to clean up the output a little more.

In our grammar, the "b" and "d" are not really too important, we can assume if the grammar is accepted that it began with a "b" and ended with a "d", so we can use excludeFromParent on the "b" and "d" symbols to remove them from the final output.

To do so, change the "b" and "d" lines in the parser description to read:
"b": { "terminal":true, "match":"b", "excludeFromProduction":true },
"d": { "terminal":true, "match":"d", "excludeFromProduction":true },
Now when running the script through we see the following (assuming we kept mergeIntoParent from the previous step):
Parser Accept: [ { head: 'S',
    body: 
     [ { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       { type: 'c', value: 'c' },
       [length]: 6 ] },
  [length]: 1 ]