Class: Tokenizer

Tokenizer

new Tokenizer(dic)

Tokenizer
Parameters:
Name Type Description
dic DynamicDictionaries Dictionaries used by this tokenizer
Source:

Methods

<static> splitByPunctuation(input) → {Array.<string>}

Split into sentence by punctuation
Parameters:
Name Type Description
input string Input text
Source:
Returns:
Sentences end with punctuation
Type
Array.<string>

getLattice(text) → {ViterbiLattice}

Build word lattice
Parameters:
Name Type Description
text string Input text to analyze
Source:
Returns:
Word lattice
Type
ViterbiLattice

tokenize(text) → {Array}

Tokenize text
Parameters:
Name Type Description
text string Input text to analyze
Source:
Returns:
Tokens
Type
Array