Do some super basic parsing. In this example we'll parse things of the form:
S -> b C d.
C -> c C | [epsilon].
The first step to creating any great parser is defining the underlying grammar. In Js-parse we use a simple javascript object:
var parserDescription = {
"symbols":{
"b": { "terminal":true, "match":"b" },
"c": { "terminal":true, "match":"c" },
"d": { "terminal":true, "match":"d" }
},
"productions":{
"S":[
[ "b", "C", "d" ]
],
"C":[
[ "c", "C" ],
[ "c" ]
]
},
"startSymbols": [ "S" ]
};
In this step we create a parser out of the parser description object we created above.
Be sure to have included the parser object definitions using something likevar Parser = require("../../lib/index").Parser.LRParser;
.
// Create the parser
var parser = Parser.CreateWithLexer(parserDescription);
// Bind some event handlers
parser.on("accept", function(token_stack){
console.log("Parser Accept:", require('util').inspect(token_stack, true, 1000));
});
parser.on("error", function(error){
console.log("Parse Error: ", error.message);
throw error.message;
});
Pass an input string to the parser, tell it you've reached the end of the stream and see the magic happen.
We want to send some input to the parser but also let it know that we've reached the end of the string. Once it is told the end of the string was reached it will decide whether or not to accept the entire input according to the grammar.
// Begin processing the input
var input = "bccccccd";
parser.append(input);
parser.end();
If all went according to plan you should see some output like this:
Parser Accept: [ { head: 'S', body: [ { type: 'b', value: 'b' }, { head: 'C', body: [ { type: 'c', value: 'c' }, { head: 'C', body: [ { type: 'c', value: 'c' }, { head: 'C', body: [ { type: 'c', value: 'c' }, { head: 'C', body: [ { type: 'c', value: 'c' }, { head: 'C', body: [ { type: 'c', value: 'c' }, { head: 'C', body: [ { type: 'c', value: 'c' }, [length]: 1 ] }, [length]: 2 ] }, [length]: 2 ] }, [length]: 2 ] }, [length]: 2 ] }, [length]: 2 ] }, { type: 'd', value: 'd' }, [length]: 3 ] }, [length]: 1 ]Its far from the most beautiful output we could ask for. If you look closely though, you can see the expected structure represented here.
bcde
:
Unrecognized characters at end of stream: 'e'
You can see that an exception was thrown upon calling parser.end()
because additional characters were detected at the end of the string
which did not conform to the grammar.
mergeRecursive
to clean up the output.Sometimes theres no need to create such a deeply nested structure like {"C": {"C": {"C": ... }}}
. For this purpose, mergeRecursive
was born.
"C": { "terminal":false, "mergeRecursive":true }
.
bccccccd
again and you should see a slightly different output:
Parser Accept: [ { head: 'S',
body:
[ { type: 'b', value: 'b' },
{ head: 'C',
body:
[ { type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
[length]: 6 ] },
{ type: 'd', value: 'd' },
[length]: 3 ] },
[length]: 1 ]
As you can see, all of those nested C productions have been merged into one instance of a C production and the value being and array of all the values that existed under a chain of nested C type productions.
mergeRecursive
will only merge elements if the parent and child are of the same type.
mergeIntoParent
to clean up the output in a different way.
mergeIntoParent
is used when the production is important to the parsing process, but not of any semantic relevance.
It causes the production of that symbol to be merged directly into the parent instead of appearing as a child of the given production
C
line in our symbols object to "C": { "terminal":false, "mergeIntoParent":true }
and ran our script again, we see different output:
[ { head: 'S',
body:
[ { type: 'b', value: 'b' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'd', value: 'd' },
[length]: 8 ] },
[length]: 1 ]
This happens because each time a "c" is encountered it is added to the "S" production rather than creating a new "C" and adding it to that.
excludeFromParent
to clean up the output a little more.
In our grammar, the "b" and "d" are not really too important, we can assume if the grammar is accepted that it began with a "b" and ended with a "d", so we can use excludeFromParent
on the "b" and "d" symbols to remove them from the final output.
"b": { "terminal":true, "match":"b", "excludeFromProduction":true },
"d": { "terminal":true, "match":"d", "excludeFromProduction":true },
Now when running the script through we see the following (assuming we kept mergeIntoParent
from the previous step):
Parser Accept: [ { head: 'S',
body:
[ { type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
{ type: 'c', value: 'c' },
[length]: 6 ] },
[length]: 1 ]