Options
All
  • Public
  • Public/Protected
  • All
Menu

字典识别模块

author

老雷leizongmin@gmail.com

Hierarchy

Implements

Index

Type aliases

Static IAssessRow

IAssessRow: object

Type declaration

  • a: number

    词总频率,越大越好

  • b: number

    词标准差,越小越好

  • c: number

    未识别词,越小越好

  • d: number

    符合语法结构程度,越大越好

  • x: number

    词数量,越小越好

Constructors

constructor

Properties

MAX_CHUNK_COUNT

MAX_CHUNK_COUNT: number = 50

防止因無分段導致分析過久甚至超過處理負荷 越高越精準但是處理時間會加倍成長甚至超過記憶體能處理的程度

數字越小越快

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

type

{number}

Protected Optional _POSTAG

_POSTAG: POSTAG

Protected _TABLE

_TABLE: IDICT<IWord>

Protected _TABLE2

_TABLE2: IDICT2<IWord>

Optional inited

inited: boolean

name

name: string

Optional priority

priority: number

segment

segment: Segment

type

type: "tokenizer" = "tokenizer"

Static type

type: "tokenizer" = "tokenizer"

Methods

_cache

  • _cache(): void

Protected _splitUnknow

  • _splitUnknow<T, U>(words: T[], fn: function): U[]

Protected _splitUnset

  • _splitUnset<T, U>(words: T[], fn: function): U[]

Protected createToken

  • createToken<T, U>(data: T, skipCheck?: boolean, attr?: U & IWordDebugInfo): T

Protected debugToken

  • debugToken<T, U>(data: T, attr?: U & IWordDebugInfo, returnToken?: true, ...argv: any[]): T

Protected filterWord

getChunks

  • getChunks(wordpos: object, pos: number, text?: string, total_count?: number): IWord[][]
  • 取所有分支

    Parameters

    • wordpos: object
      • [index: number]: IWord[]
    • pos: number

      当前位置

    • Optional text: string

      本节要分词的文本

    • Default value total_count: number = 0

    Returns IWord[][]

getPosInfo

  • getPosInfo(words: IWord[], text: string): object

getTops

init

  • init(segment: Segment, ...argv: any[]): this

Protected matchWord

  • matchWord(text: string, cur: number, preword: IWord): IWord[]
  • 匹配单词,返回相关信息

    Parameters

    • text: string

      文本

    • cur: number

      开始位置

    • preword: IWord

      上一个单词

    Returns IWord[]

    返回格式 {w: '单词', c: 开始位置}

Protected sliceToken

  • sliceToken<T>(words: T[], pos: number, len: number, data: T, skipCheck?: boolean): T[]

split

Static Protected _init

Static init

  • init<T>(segment: Segment, ...argv: any[]): T

Generated using TypeDoc