Namespace: regexGen

regexGen

The Generator

The generator is exported as the regexGen() function, everything must be referenced from it.

To generate a regular expression, pass sub-expressions as parameters to the call of regexGen() function.

Sub-expressions are then concatenated together to form the whole regular expression.

Sub-expressions can either be a string, a number, a RegExp object, or any combinations of the call to methods (i.e., the sub-generators) of the regexGen() function object.

Strings passed to the the call of regexGen(), text(), maybe(), anyCharOf() and anyCharBut() functions, are always escaped as necessary, so you don't have to worry about which characters to escape.

The result of calling the regexGen() function is a RegExp object. See The RegExp Object section for detail. Since everything must be referenced from the regexGen() function, to simplify codes, assign it to a short variable is preferable.

Source:

Example

var _ = regexGen;

var regex = regexGen(
    _.startOfLine(),
    _.capture( 'http', _.maybe( 's' ) ), '://',
    _.capture( _.anyCharBut( ':/' ).repeat() ),
    _.group( ':', _.capture( _.digital().multiple(2,4) ) ).maybe(), '/',
    _.capture( _.anything() ),
    _.endOfLine()
);
var matches = regex.exec( url );

Methods

(static) any() → {Term}

Source:
Returns:
Type
Term

(static) anyChar() → {Term}

Matches any single character except the newline character (.)

Source:
Returns:
Type
Term

(static) anyCharBut() → {Term}

Anything but these characters ([^abc]) usage: anyCharBut( [ 'a', 'c' ], ['2', '6'], 'fgh', 'z' ): ([^a-c2-6fghz])

Source:
Returns:
Type
Term

(static) anyCharOf() → {Term}

Any given character ([abc]) usage: anyCharOf( [ 'a', 'c' ], ['2', '6'], 'fgh', 'z' ): ([a-c2-6fghz])

Source:
Returns:
Type
Term

(static) anything() → {Term}

Matches any characters except the newline character: (.*)

Source:
Returns:
Type
Term

(static) ascii() → {Term}

Matches the character with the code hh (two hexadecimal digits)

Source:
Returns:
Type
Term

(static) backspace() → {Term}

Matches a backspace (U+0008). You need to use square brackets if you want to match a literal backspace character. (Not to be confused with \b.)

Source:
Returns:
Type
Term

(static) capture() → {Capture}

Matches specified terms and remembers the match. The generated parentheses are called capturing parentheses. label 是用來供 back reference 索引 capture 的編號。 計算方式是由左至右,計算左括號出現的順序,也就是先深後廣搜尋。 capture( label('cap1'), capture( label('cap2'), 'xxx' ), capture( label('cap3'), '...' ), 'something else' )

Source:
Returns:
Type
Capture

(static) carriageReturn() → {Term}

Matches a carriage return: (\r)

Source:
Returns:
Type
Term

(static) controlChar() → {Term}

Matches a control character in a string. Where X is a character ranging from A to Z.

Source:
Returns:
Type
Term

(static) digital() → {Term}

Matches a digit character: (\d)

Source:
Returns:
Type
Term

(static) either() → {Sequence}

Adds alternative expressions

Source:
Returns:
Type
Sequence

(static) endOfLine() → {Term}

Source:
Returns:
Type
Term

(static) formFeed() → {Term}

Matches a form feed: (\f)

Source:
Returns:
Type
Term

(static) group() → {Sequence}

Matches specified terms but does not remember the match. The generated parentheses are called non-capturing parentheses.

Source:
Returns:
Type
Sequence

(static) hexDigital() → {Term}

Source:
Returns:
Type
Term

(static) ignoreCase()

Case-insensitivity modifier.

Source:

(static) label() → {Label}

label is a reference to a capture group, and is allowed only in the capture() method

Source:
Returns:
Type
Label

(static) lineBreak() → {Term}

Matches any line break, includes Unix and windows CRLF

Source:
Returns:
Type
Term

(static) lineFeed() → {Term}

Matches a line feed: (\n)

Source:
Returns:
Type
Term

(static) many() → {Term}

occurs one or more times (x+)

Source:
Returns:
Type
Term

(static) maybe() → {Term}

Any optional character sequence, shortcut for Term.maybe ((?:abc)?)

Source:
Returns:
Type
Term

(static) mixin(global)

A utility function helps using the regexGen generator.

Parameters:
Name Type Description
global Object

the target object that sub-generators will inject to.

Source:

(static) nonDigital() → {Term}

Matches any non-digit character

Source:
Returns:
Type
Term

(static) nonSpace() → {Term}

Matches a single character other than white space: (\S)

Source:
Returns:
Type
Term

(static) nonWord() → {Term}

Matches any non-word character.

Source:
Returns:
Type
Term

(static) nonWordBoundary() → {Term}

Matches a non-word boundary. This matches a position where the previous and next character are of the same type: Either both must be words, or both must be non-words. The beginning and end of a string are considered non-words.

Source:
Returns:

the non-word boundary expression term object.

Type
Term

(static) nullChar() → {Term}

Matches a NULL (U+0000) character. Do not follow this with another digit, because \0 is an octal escape sequence.

Source:
Returns:
Type
Term

(static) regex() → {Term|RegexOverwrite}

trust me, just put the value as is.

Source:
Returns:
Type
Term | RegexOverwrite

(static) sameAs() → {CaptureReference}

back reference

Source:
Returns:
Type
CaptureReference

(static) searchAll()

Default behaviour is with "g" modifier, so we can turn this another way around than other modifiers

Source:

(static) searchMultiLine()

Multiline

Source:

(static) space() → {Term}

Matches a single white space character, including space, tab, form feed, line feed: (\s)

Source:
Returns:
Type
Term

(static) startOfLine() → {Term}

Source:
Returns:
Type
Term

(static) tab() → {Term}

Matches a tab (U+0009): (\t)

Source:
Returns:
Type
Term

(static) text(value) → {Term}

Any character sequence (abc).

Parameters:
Name Type Description
value String

the character sequence.

Source:
Returns:

the text literal expression term object.

Type
Term

(static) unicode() → {Term}

Matches the character with the code hhhh (four hexadecimal digits).

Source:
Returns:
Type
Term

(static) vertTab() → {Term}

Matches a vertical tab (U+000B): (\v)

Source:
Returns:
Type
Term

(static) word() → {Term}

Matches any alphanumeric character including the underscore: (\w)

Source:
Returns:
Type
Term

(static) wordBoundary() → {Term}

Matches a word boundary. A word boundary matches the position where a word character is not followed or preceeded by another word-character. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero. (Not to be confused with [\b].)

Source:
Returns:

the word boundary expression term object.

Type
Term

(static) words() → {Term}

Matches any alphanumeric character sequence including the underscore: (\w+)

Source:
Returns:
Type
Term