Documentation

Jison takes a context-free grammar as input and outputs a JavaScript file capable of parsing the language described by that grammar. You can then use the generated script to parse inputs and accept, reject, or perform actions based on the input. If you’re familiar with Bison or Yacc, or other clones, you’re almost ready to roll.

Installation

Jison can be installed for Narwhal using its bundled tusk command or for Node using npm

Using npm:

sudo npm install jison

Using tusk:

tusk install jison

Usage from the command line

Clone the github repository for examples:

git clone git://github.com/zaach/jison.git
cd jison/examples

Now you’re ready to generate some parsers:

jison calculator.jison

This will generate calculator.js in your current working directory. This script can be used to parse an input file, like so:

echo "2^32 / 1024" > testcalc
narwhal calculator.js testcalc

This will print out 4194304.

Usage from a CommonJS Module

You can generate parsers programatically from JavaScript as well. Assuming Jison is in your commonjs environment’s load path:

// mygenerator.js
var Parser = require("jison").Parser;

var grammar = {
    "lex": {
        "rules": [
           ["\\s+", "/* skip whitespace */"],
           ["[a-f0-9]+", "return 'HEX';"]
        ]
    },

    "bnf": {
        "hex_strings" :[ "hex_strings HEX",
                         "HEX" ]
    }
};

var parser = new Parser(grammar);

// generate source, ready to be written to disk
var parserSource = parser.generate();

// you can also use the parser directly from memory

parser.parse("adfe34bc e82a");
// returns true

parser.parse("adfe34bc zxg");
// throws lexical error

Using the Generated Parser

Once you have generated the parser and saved it, you no longer need Jison or any other dependencies.

As demonstrated before, the parser can be used from the command line:

narwhal calculator.js testcalc

Though, more ideally, the parser will be a dependency of another module. You can require it from another module like so:

// mymodule.js
var parser = require("./calculator").parser;

function exec (input) {
    return parser.parse(input);
}

var twenty = exec("4 * 5");

Or more succinctly:

// mymodule.js
function exec (input) {
    return require("./calculator").parse(input);
}

var twenty = exec("4 * 5");

Using the Parser from the Web

The generated parser script may be included in a web page without any need for a CommonJS loading environment. It’s as simple as pointing to it via a scipt tag:

<script src="calculator.js"></script>

When you generate the parser, you can specify the variable name it will be declared as:

// mygenerator.js
var parserSource = generator.generate({moduleName: "calc"});
// then write parserSource to a file called, say, calc.js

Whatever moduleName you specified will be the the variable you can access the parser from in your web page:

<script src="calc.js"></script>
<script>
  calc.parse("42 / 0");
</script>

The moduleName you specify can also include a namespace, e.g:

// mygenerator.js
var parserSource = parser.generate({moduleName: "myCalculator.parser"});

And could be used like so:

<script>
  var myCalculator = {};
</script>

<script src="calc.js"></script>

<script>
  myCalculator.parser.parse("42 / 0");
</script>

Or something like that – you get the picture.

A demo of the calculator script used in a web page is here.

The Concepts of Jison

Until the Bison guide is properly ported for Jison, you can refer to it for the major concepts, which are equivalent (except for the bits about static typing of semantic values, and other obvious C artifacts.)

Other helpful sections:

Specifying a Language

The process of parsing a language involves two phases: lexical analysis (tokenizing) and parsing, which the Lex/Yacc and Flex/Bison combinations are famous for. Jison lets you specify a parser much like you would using Bison/Flex, with separate files for tokenization rules and for the language grammar, or with the tokenization rules embedded in the main grammar.

For example, here is the grammar for the calculator parser:

/* description: Parses end executes mathematical expressions. */

/* lexical grammar */
%lex

%%
\s+                   {/* skip whitespace */}
[0-9]+("."[0-9]+)?\b  {return 'NUMBER';}
"*"                   {return '*';}
"/"                   {return '/';}
"-"                   {return '-';}
"+"                   {return '+';}
"^"                   {return '^';}
"("                   {return '(';}
")"                   {return ')';}
"PI"                  {return 'PI';}
"E"                   {return 'E';}
<<EOF>>               {return 'EOF';}

/lex

/* operator associations and precedence */

%left '+' '-'
%left '*' '/'
%left '^'
%left UMINUS

%start expressions

%% /* language grammar */

expressions
    : e EOF
        {print($1); return $1;}
    ;

e
    : e '+' e
        {$$ = $1+$3;}
    | e '-' e
        {$$ = $1-$3;}
    | e '*' e
        {$$ = $1*$3;}
    | e '/' e
        {$$ = $1/$3;}
    | e '^' e
        {$$ = Math.pow($1, $3);}
    | '-' e %prec UMINUS
        {$$ = -$2;}
    | '(' e ')'
        {$$ = $2;}
    | NUMBER
        {$$ = Number(yytext);}
    | E
        {$$ = Math.E;}
    | PI
        {$$ = Math.PI;}
    ;

which compiles down to this JSON representation used directly by Jison:

{
    "lex": {
        "rules": [
           ["\\s+",                    "/* skip whitespace */"],
           ["[0-9]+(?:\\.[0-9]+)?\\b", "return 'NUMBER';"],
           ["\\*",                     "return '*';"],
           ["\\/",                     "return '/';"],
           ["-",                       "return '-';"],
           ["\\+",                     "return '+';"],
           ["\\^",                     "return '^';"],
           ["\\(",                     "return '(';"],
           ["\\)",                     "return ')';"],
           ["PI\\b",                   "return 'PI';"],
           ["E\\b",                    "return 'E';"],
           ["$",                       "return 'EOF';"]
        ]
    },

    "operators": [
        ["left", "+", "-"],
        ["left", "*", "/"],
        ["left", "^"],
        ["left", "UMINUS"]
    ],

    "bnf": {
        "expressions" :[[ "e EOF",   "print($1); return $1;"  ]],

        "e" :[[ "e + e",   "$$ = $1 + $3;" ],
              [ "e - e",   "$$ = $1 - $3;" ],
              [ "e * e",   "$$ = $1 * $3;" ],
              [ "e / e",   "$$ = $1 / $3;" ],
              [ "e ^ e",   "$$ = Math.pow($1, $3);" ],
              [ "- e",     "$$ = -$2;", {"prec": "UMINUS"} ],
              [ "( e )",   "$$ = $2;" ],
              [ "NUMBER",  "$$ = Number(yytext);" ],
              [ "E",       "$$ = Math.E;" ],
              [ "PI",      "$$ = Math.PI;" ]]
    }
}

Jison accepts both the Bison/Flex style format, or the raw JSON format, e.g:

narwhal bin/jison examples/calculator.jison

or

narwhal bin/jison examples/calculator.json

When the lexical grammar resides in its own (.jisonlex) file, use that as the second argument to Jison, e.g.:

narwhal bin/jison examples/classy.jison examples/classy.jisonlex

More examples can be found in the examples/ and tests/parser/ directories.

Lexical Analysis

Jison includes a rather rudimentary scanner generator, though any module that supports the basic scanner API could be used in its place.

The format of the input file (including macro support) and the style of the pattern matchers are modeled after Flex. It is a subset, so there are some differences, namely exact string patterns must be placed in quotes and actions must be contained within brackets ({, }), e.g.:

Bad:

[0-9]+zomg    print(yytext)

Good:

[0-9]+"zomg"    { print(yytext); }

Custom Scanners

TODO

Sharing Scope

In Bison, code is expected to be lexically defined within the scope of the semantic actions. E.g., chunks of code may be included in the generated parser source, which are available from semantic actions.

Jison is more modular. Instead of pulling code into the generated module, the generated module is expected to be required and used by other modules. This means that if you want to expose functionality to the semantic actions, you can’t rely on it being available through lexical scoping. Instead, the parser has a yy property which is exposed to actions as the yy free variable. Any functionality attached to this property is available in both lexical and semantic actions through the yy free variable.

An example from orderly.js:

var parser = require("./orderly/parse").parser;

// set parser's shared scope
parser.yy = require("./orderly/scope");

// returns the JSON object
var parse = exports.parse = function (input) {
    return parser.parse(input);
};
...

The scope module contains logic for building data structures, which is used within the semantic actions.

TODO: More on this.

Parsing algorithms

Like Bison, Jison can recognize languages described by LALR(1) grammars, though it also has modes for LR(0), SLR(1), and LR(1). It also has a special mode for generating LL(1) parse tables (requested by my professor,) and could be extended to generate a recursive descent parser for LL(k) languages in the future. But, for now, Jison is geared toward bottom-up parsing.

*LR(1) mode is currently not practical for use with anything other than toy grammars, but that is entirely a consequence of the algorithm used, and may change in the future.

Real world examples

Contributors

License

Copyright (c) 2009 Zachary Carter

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Fork me on GitHub