lunr.js 0.3.3

lunr

Convenience function for instantiating a new lunr index and configuring it with the default pipeline functions and the passed config function.

When using this convenience function a new index will be created with the following functions already in the pipeline:

lunr.StopWordFilter - filters out any stop words before they enter the index

lunr.stemmer - stems the tokens before entering the index.

Example:

var idx = lunr(function () {
  this.field('title', 10)
  this.field('tags', 100)
  this.field('body')

  this.ref('cid')

  this.pipeline.add(function () {
    // some custom pipeline function
  })

})

tokenizer

A function for splitting a string into tokens ready to be inserted into the search index.

Pipeline

lunr.Pipelines maintain an ordered list of functions to be applied to all tokens in documents entering the search index and queries being ran against the index.

An instance of lunr.Index created with the lunr shortcut will contain a pipeline with a stop word filter and an English language stemmer. Extra functions can be added before or after either of these functions or these default functions can be removed.

When run the pipeline will call each function in turn, passing a token, the index of that token in the original list of all tokens and finally a list of all the original tokens.

The output of functions in the pipeline will be passed to the next function in the pipeline. To exclude a token from entering the index the function should return undefined, the rest of the pipeline will not be called with this token.

For serialisation of pipelines to work, all functions used in an instance of a pipeline should be registered with lunr.Pipeline. Registered functions can then be loaded. If trying to load a serialised pipeline that uses functions that are not registered an error will be thrown.

If not planning on serialising the pipeline then registering pipeline functions is not necessary.

registerFunction

lunr.Pipeline.registerFunction()

method

Params

  • fn - The function to check for.
  • label - The label to register this function with

Register a function with the pipeline.

Functions that are used in the pipeline should be registered if the pipeline needs to be serialised, or a serialised pipeline needs to be loaded.

Registering a function does not add it to a pipeline, functions must still be added to instances of the pipeline for them to be used when running a pipeline.

Source

lunr.Pipeline.registerFunction = function (fn, label) {
  if (label in this.registeredFunctions) {
    lunr.utils.warn('Overwriting existing registered function: ' + label)
  }

  fn.label = label
  lunr.Pipeline.registeredFunctions[fn.label] = fn
}

warnIfFunctionNotRegistered

lunr.Pipeline.warnIfFunctionNotRegistered()

method

Params

  • fn - The function to check for.

Warns if the function is not registered as a Pipeline function.

Source

lunr.Pipeline.warnIfFunctionNotRegistered = function (fn) {
  var isRegistered = fn.label && (fn.label in this.registeredFunctions)

  if (!isRegistered) {
    lunr.utils.warn('Function is not registered with pipeline. This may cause problems when serialising the index.\n', fn)
  }
}

load

lunr.Pipeline.load()

method

Params

  • serialised - The serialised pipeline to load.

Loads a previously serialised pipeline.

All functions to be loaded must already be registered with lunr.Pipeline. If any function from the serialised data has not been registered then an error will be thrown.

Source

lunr.Pipeline.load = function (serialised) {
  var pipeline = new lunr.Pipeline

  serialised.forEach(function (fnName) {
    var fn = lunr.Pipeline.registeredFunctions[fnName]

    if (fn) {
      pipeline.add(fn)
    } else {
      throw new Error ('Cannot load un-registered function: ' + fnName)
    }
  })

  return pipeline
}

add

lunr.Pipeline.prototype.add()

method

Params

  • functions - Any number of functions to add to the pipeline.

Adds new functions to the end of the pipeline.

Logs a warning if the function has not been registered.

Source

lunr.Pipeline.prototype.add = function () {
  var fns = Array.prototype.slice.call(arguments)

  fns.forEach(function (fn) {
    lunr.Pipeline.warnIfFunctionNotRegistered(fn)
    this._stack.push(fn)
  }, this)
}

after

lunr.Pipeline.prototype.after()

method

Params

  • existingFn - A function that already exists in the pipeline.
  • newFn - The new function to add to the pipeline.

Adds a single function after a function that already exists in the pipeline.

Logs a warning if the function has not been registered.

Source

lunr.Pipeline.prototype.after = function (existingFn, newFn) {
  lunr.Pipeline.warnIfFunctionNotRegistered(newFn)

  var pos = this._stack.indexOf(existingFn) + 1
  this._stack.splice(pos, 0, newFn)
}

before

lunr.Pipeline.prototype.before()

method

Params

  • existingFn - A function that already exists in the pipeline.
  • newFn - The new function to add to the pipeline.

Adds a single function before a function that already exists in the pipeline.

Logs a warning if the function has not been registered.

Source

lunr.Pipeline.prototype.before = function (existingFn, newFn) {
  lunr.Pipeline.warnIfFunctionNotRegistered(newFn)

  var pos = this._stack.indexOf(existingFn)
  this._stack.splice(pos, 0, newFn)
}

remove

lunr.Pipeline.prototype.remove()

method

Params

  • fn - The function to remove from the pipeline.

Removes a function from the pipeline.

Source

lunr.Pipeline.prototype.remove = function (fn) {
  var pos = this._stack.indexOf(fn)
  this._stack.splice(pos, 1)
}

run

lunr.Pipeline.prototype.run()

method

Params

  • tokens - The tokens to run through the pipeline.

Runs the current list of functions that make up the pipeline against the passed tokens.

Source

lunr.Pipeline.prototype.run = function (tokens) {
  var out = [],
      tokenLength = tokens.length,
      stackLength = this._stack.length

  for (var i = 0; i < tokenLength; i++) {
    var token = tokens[i]

    for (var j = 0; j < stackLength; j++) {
      token = this._stack[j](token, i, tokens)
      if (token === void 0) break
    };

    if (token !== void 0) out.push(token)
  };

  return out
}

toJSON

lunr.Pipeline.prototype.toJSON()

method

Returns a representation of the pipeline ready for serialisation.

Logs a warning if the function has not been registered.

Source

lunr.Pipeline.prototype.toJSON = function () {
  return this._stack.map(function (fn) {
    lunr.Pipeline.warnIfFunctionNotRegistered(fn)

    return fn.label
  })
}

Vector

lunr.Vectors wrap arrays and add vector related operations for the array elements.

magnitude

lunr.Vector.prototype.magnitude()

method

Calculates the magnitude of this vector.

Source

lunr.Vector.prototype.magnitude = function () {
  if (this._magnitude) return this._magnitude

  var sumOfSquares = 0,
      elems = this.elements,
      len = elems.length,
      el

  for (var i = 0; i < len; i++) {
    el = elems[i]
    sumOfSquares += el * el
  };

  return this._magnitude = Math.sqrt(sumOfSquares)
}

dot

lunr.Vector.prototype.dot()

method

Params

  • otherVector - The vector to compute the dot product with.

Calculates the dot product of this vector and another vector.

Source

lunr.Vector.prototype.dot = function (otherVector) {
  var elem1 = this.elements,
      elem2 = otherVector.elements,
      length = elem1.length,
      dotProduct = 0

  for (var i = 0; i < length; i++) {
    dotProduct += elem1[i] * elem2[i]
  };

  return dotProduct
}

similarity

lunr.Vector.prototype.similarity()

method

Params

  • otherVector - The other vector to calculate the

Calculates the cosine similarity between this vector and another vector.

Source

lunr.Vector.prototype.similarity = function (otherVector) {
  return this.dot(otherVector) / (this.magnitude() * otherVector.magnitude())
}

toArray

lunr.Vector.prototype.toArray()

method

Converts this vector back into an array.

Source

lunr.Vector.prototype.toArray = function () {
  return this.elements
}

SortedSet

lunr.SortedSets are used to maintain an array of uniq values in a sorted order.

load

lunr.SortedSet.load()

method

Params

  • serialisedData - The serialised set to load.

Loads a previously serialised sorted set.

Source

lunr.SortedSet.load = function (serialisedData) {
  var set = new this

  set.elements = serialisedData
  set.length = serialisedData.length

  return set
}

add

lunr.SortedSet.prototype.add()

method

Params

  • The - objects to add to this set.

Inserts new items into the set in the correct position to maintain the order.

Source

lunr.SortedSet.prototype.add = function () {
  Array.prototype.slice.call(arguments).forEach(function (element) {
    if (~this.indexOf(element)) return
    this.elements.splice(this.locationFor(element), 0, element)
  }, this)

  this.length = this.elements.length
}

toArray

lunr.SortedSet.prototype.toArray()

method

Converts this sorted set into an array.

Source

lunr.SortedSet.prototype.toArray = function () {
  return this.elements.slice()
}

map

lunr.SortedSet.prototype.map()

method

Params

  • fn - The function that is called on each element of the
  • ctx - An optional object that can be used as the context

Creates a new array with the results of calling a provided function on every element in this sorted set.

Delegates to Array.prototype.map and has the same signature.

Source

lunr.SortedSet.prototype.map = function (fn, ctx) {
  return this.elements.map(fn, ctx)
}

forEach

lunr.SortedSet.prototype.forEach()

method

Params

  • fn - The function that is called on each element of the
  • ctx - An optional object that can be used as the context

Executes a provided function once per sorted set element.

Delegates to Array.prototype.forEach and has the same signature.

Source

lunr.SortedSet.prototype.forEach = function (fn, ctx) {
  return this.elements.forEach(fn, ctx)
}

indexOf

lunr.SortedSet.prototype.indexOf()

method

Params

  • elem - The object to locate in the sorted set.
  • start - An optional index at which to start searching from
  • end - An optional index at which to stop search from within

Returns the index at which a given element can be found in the sorted set, or -1 if it is not present.

Source

lunr.SortedSet.prototype.indexOf = function (elem, start, end) {
  var start = start || 0,
      end = end || this.elements.length,
      sectionLength = end - start,
      pivot = start + Math.floor(sectionLength / 2),
      pivotElem = this.elements[pivot]

  if (sectionLength <= 1) {
    if (pivotElem === elem) {
      return pivot
    } else {
      return -1
    }
  }

  if (pivotElem < elem) return this.indexOf(elem, pivot, end)
  if (pivotElem > elem) return this.indexOf(elem, start, pivot)
  if (pivotElem === elem) return pivot
}

locationFor

lunr.SortedSet.prototype.locationFor()

method

Params

  • elem - The elem to find the position for in the set
  • start - An optional index at which to start searching from
  • end - An optional index at which to stop search from within

Returns the position within the sorted set that an element should be inserted at to maintain the current order of the set.

This function assumes that the element to search for does not already exist in the sorted set.

Source

lunr.SortedSet.prototype.locationFor = function (elem, start, end) {
  var start = start || 0,
      end = end || this.elements.length,
      sectionLength = end - start,
      pivot = start + Math.floor(sectionLength / 2),
      pivotElem = this.elements[pivot]

  if (sectionLength <= 1) {
    if (pivotElem > elem) return pivot
    if (pivotElem < elem) return pivot + 1
  }

  if (pivotElem < elem) return this.locationFor(elem, pivot, end)
  if (pivotElem > elem) return this.locationFor(elem, start, pivot)
}

intersect

lunr.SortedSet.prototype.intersect()

method

Params

  • otherSet - The set to intersect with this set.

Creates a new lunr.SortedSet that contains the elements in the intersection of this set and the passed set.

Source

lunr.SortedSet.prototype.intersect = function (otherSet) {
  var intersectSet = new lunr.SortedSet,
      i = 0, j = 0,
      a_len = this.length, b_len = otherSet.length,
      a = this.elements, b = otherSet.elements

  while (true) {
    if (i > a_len - 1 || j > b_len - 1) break

    if (a[i] === b[j]) {
      intersectSet.add(a[i])
      i++, j++
      continue
    }

    if (a[i] < b[j]) {
      i++
      continue
    }

    if (a[i] > b[j]) {
      j++
      continue
    }
  };

  return intersectSet
}

clone

lunr.SortedSet.prototype.clone()

method

Makes a copy of this set

Source

lunr.SortedSet.prototype.clone = function () {
  var clone = new lunr.SortedSet

  clone.elements = this.toArray()
  clone.length = clone.elements.length

  return clone
}

union

lunr.SortedSet.prototype.union()

method

Params

  • otherSet - The set to union with this set.

Creates a new lunr.SortedSet that contains the elements in the union of this set and the passed set.

Source

lunr.SortedSet.prototype.union = function (otherSet) {
  var longSet, shortSet, unionSet

  if (this.length >= otherSet.length) {
    longSet = this, shortSet = otherSet
  } else {
    longSet = otherSet, shortSet = this
  }

  unionSet = longSet.clone()

  unionSet.add.apply(unionSet, shortSet.toArray())

  return unionSet
}

toJSON

lunr.SortedSet.prototype.toJSON()

method

Returns a representation of the sorted set ready for serialisation.

Source

lunr.SortedSet.prototype.toJSON = function () {
  return this.toArray()
}

Index

lunr.Index is object that manages a search index. It contains the indexes and stores all the tokens and document lookups. It also provides the main user facing API for the library.

load

lunr.Index.load()

method

Params

  • serialisedData - The serialised set to load.

Loads a previously serialised index.

Issues a warning if the index being imported was serialised by a different version of lunr.

Source

lunr.Index.load = function (serialisedData) {
  if (serialisedData.version !== lunr.version) {
    lunr.utils.warn('version mismatch: current ' + lunr.version + ' importing ' + serialisedData.version)
  }

  var idx = new this

  idx._fields = serialisedData.fields
  idx._ref = serialisedData.ref

  idx.documentStore = lunr.Store.load(serialisedData.documentStore)
  idx.tokenStore = lunr.TokenStore.load(serialisedData.tokenStore)
  idx.corpusTokens = lunr.SortedSet.load(serialisedData.corpusTokens)
  idx.pipeline = lunr.Pipeline.load(serialisedData.pipeline)

  return idx
}

field

lunr.Index.prototype.field()

method

Params

  • fieldName - The name of the field within the document that
  • boost - An optional boost that can be applied to terms in this

Adds a field to the list of fields that will be searchable within documents in the index.

An optional boost param can be passed to affect how much tokens in this field rank in search results, by default the boost value is 1.

Fields should be added before any documents are added to the index, fields that are added after documents are added to the index will only apply to new documents added to the index.

Source

lunr.Index.prototype.field = function (fieldName, opts) {
  var opts = opts || {},
      field = { name: fieldName, boost: opts.boost || 1 }

  this._fields.push(field)
  return this
}

ref

lunr.Index.prototype.ref()

method

Params

  • refName - The property to use to uniquely identify the

Sets the property used to uniquely identify documents added to the index, by default this property is 'id'.

This should only be changed before adding documents to the index, changing the ref property without resetting the index can lead to unexpected results.

Source

lunr.Index.prototype.ref = function (refName) {
  this._ref = refName
  return this
}

add

lunr.Index.prototype.add()

method

Params

  • doc - The document to add to the index.

Add a document to the index.

This is the way new documents enter the index, this function will run the fields from the document through the index's pipeline and then add it to the index, it will then show up in search results.

Source

lunr.Index.prototype.add = function (doc) {
  var docTokens = {},
      allDocumentTokens = new lunr.SortedSet,
      docRef = doc[this._ref]

  this._fields.forEach(function (field) {
    var fieldTokens = this.pipeline.run(lunr.tokenizer(doc[field.name]))

    docTokens[field.name] = fieldTokens
    lunr.SortedSet.prototype.add.apply(allDocumentTokens, fieldTokens)
  }, this)

  this.documentStore.set(docRef, allDocumentTokens)
  lunr.SortedSet.prototype.add.apply(this.corpusTokens, allDocumentTokens.toArray())

  for (var i = 0; i < allDocumentTokens.length; i++) {
    var token = allDocumentTokens.elements[i]
    var tf = this._fields.reduce(function (memo, field) {
      var fieldLength = docTokens[field.name].length

      if (!fieldLength) return memo

      var tokenCount = docTokens[field.name].filter(function (t) { return t === token }).length

      return memo + (tokenCount / fieldLength * field.boost)
    }, 0)

    this.tokenStore.add(token, { ref: docRef, tf: tf })
  };
}

remove

lunr.Index.prototype.remove()

method

Params

  • doc - The document to remove from the index.

Removes a document from the index.

To make sure documents no longer show up in search results they can be removed from the index using this method.

The document passed only needs to have the same ref property value as the document that was added to the index, they could be completely different objects.

Source

lunr.Index.prototype.remove = function (doc) {
  var docRef = doc[this._ref]

  if (!this.documentStore.has(docRef)) return

  var docTokens = this.documentStore.get(docRef)

  this.documentStore.remove(docRef)

  docTokens.forEach(function (token) {
    this.tokenStore.remove(token, docRef)
  }, this)
}

update

lunr.Index.prototype.update()

method

Params

  • doc - The document to update in the index.

Updates a document in the index.

When a document contained within the index gets updated, fields changed, added or removed, to make sure it correctly matched against search queries, it should be updated in the index.

This method is just a wrapper around remove and add

Source

lunr.Index.prototype.update = function (doc) {
  this.remove(doc)
  this.add(doc)
}

idf

lunr.Index.prototype.idf()

method

Params

  • token - The token to calculate the idf of.

Calculates the inverse document frequency for a token within the index.

Source

lunr.Index.prototype.idf = function (term) {
  var documentFrequency = Object.keys(this.tokenStore.get(term)).length

  if (documentFrequency === 0) {
    return 1
  } else {
    return 1 + Math.log(this.tokenStore.length / documentFrequency)
  }
}

documentVector

lunr.Index.prototype.documentVector()

method

Params

  • documentRef - The ref to find the document with.

Generates a vector containing all the tokens in the document matching the passed documentRef.

The vector contains the tf-idf score for each token contained in the document with the passed documentRef. The vector will contain an element for every token in the indexes corpus, if the document does not contain that token the element will be 0.

Source

lunr.Index.prototype.documentVector = function (documentRef) {
  var documentTokens = this.documentStore.get(documentRef),
      documentTokensLength = documentTokens.length,
      documentArr = new Array (this.corpusTokens.length)

  for (var i = 0; i < documentTokensLength; i++) {
    var token = documentTokens.elements[i],
        tf = this.tokenStore.get(token)[documentRef].tf,
        idf = this.idf(token)

    documentArr[this.corpusTokens.indexOf(token)] = tf * idf
  };

  return new lunr.Vector (documentArr)
}

toJSON

lunr.Index.prototype.toJSON()

method

Returns a representation of the index ready for serialisation.

Source

lunr.Index.prototype.toJSON = function () {
  return {
    version: lunr.version,
    fields: this._fields,
    ref: this._ref,
    documentStore: this.documentStore.toJSON(),
    tokenStore: this.tokenStore.toJSON(),
    corpusTokens: this.corpusTokens.toJSON(),
    pipeline: this.pipeline.toJSON()
  }
}

Store

lunr.Store is a simple key-value store used for storing sets of tokens for documents stored in index.

load

lunr.Store.load()

method

Params

  • serialisedData - The serialised store to load.

Loads a previously serialised store

Source

lunr.Store.load = function (serialisedData) {
  var store = new this

  store.length = serialisedData.length
  store.store = Object.keys(serialisedData.store).reduce(function (memo, key) {
    memo[key] = lunr.SortedSet.load(serialisedData.store[key])
    return memo
  }, {})

  return store
}

set

lunr.Store.prototype.set()

method

Params

  • id - The key used to store the tokens against.
  • tokens - The tokens to store against the key.

Stores the given tokens in the store against the given id.

Source

lunr.Store.prototype.set = function (id, tokens) {
  this.store[id] = tokens
  this.length = Object.keys(this.store).length
}

get

lunr.Store.prototype.get()

method

Params

  • id - The key to lookup and retrieve from the store.

Retrieves the tokens from the store for a given key.

Source

lunr.Store.prototype.get = function (id) {
  return this.store[id]
}

has

lunr.Store.prototype.has()

method

Params

  • id - The id to look up in the store.

Checks whether the store contains a key.

Source

lunr.Store.prototype.has = function (id) {
  return id in this.store
}

remove

lunr.Store.prototype.remove()

method

Params

  • id - The id to remove from the store.

Removes the value for a key in the store.

Source

lunr.Store.prototype.remove = function (id) {
  if (!this.has(id)) return

  delete this.store[id]
  this.length--
}

toJSON

lunr.Store.prototype.toJSON()

method

Returns a representation of the store ready for serialisation.

Source

lunr.Store.prototype.toJSON = function () {
  return {
    store: this.store,
    length: this.length
  }
}

stemmer

lunr.stemmer is an english language stemmer, this is a JavaScript implementation of the PorterStemmer taken from http://tartaurs.org/~martin

stopWordFilter

lunr.stopWordFilter is an English language stop word list filter, any words contained in the list will not be passed through the filter.

This is intended to be used in the Pipeline. If the token does not pass the filter then undefined will be returned.

TokenStore

lunr.TokenStore is used for efficient storing and lookup of the reverse index of token to document ref.

load

lunr.TokenStore.load()

method

Params

  • serialisedData - The serialised token store to load.

Loads a previously serialised token store

Source

lunr.TokenStore.load = function (serialisedData) {
  var store = new this

  store.root = serialisedData.root
  store.length = serialisedData.length

  return store
}

add

lunr.TokenStore.prototype.add()

method

Params

  • token - The token to store the doc under
  • doc - The doc to store against the token
  • root - An optional node at which to start looking for the

Adds a new token doc pair to the store.

By default this function starts at the root of the current store, however it can start at any node of any token store if required.

Source

lunr.TokenStore.prototype.add = function (token, doc, root) {
  var root = root || this.root,
      key = token[0],
      rest = token.slice(1)

  if (!(key in root)) root[key] = {docs: {}}

  if (rest.length === 0) {
    root[key].docs[doc.ref] = doc
    this.length += 1
    return
  } else {
    return this.add(rest, doc, root[key])
  }
}

has

lunr.TokenStore.prototype.has()

method

Params

  • token - The token to check for
  • root - An optional node at which to start

Checks whether this key is contained within this lunr.TokenStore.

By default this function starts at the root of the current store, however it can start at any node of any token store if required.

Source

lunr.TokenStore.prototype.has = function (token, root) {
  var root = root || this.root,
      key = token[0],
      rest = token.slice(1)

  if (!(key in root)) return false

  if (rest.length === 0) {
    return true
  } else {
    return this.has(rest, root[key])
  }
}

getNode

lunr.TokenStore.prototype.getNode()

method

Params

  • token - The token to get the node for.
  • root - An optional node at which to start.

Retrieve a node from the token store for a given token.

By default this function starts at the root of the current store, however it can start at any node of any token store if required.

Source

lunr.TokenStore.prototype.getNode = function (token, root) {
  var root = root || this.root,
      key = token[0],
      rest = token.slice(1)

  if (!(key in root)) return {}

  if (rest.length === 0) {
    return root[key]
  } else {
    return this.getNode(rest, root[key])
  }
}

get

lunr.TokenStore.prototype.get()

method

Params

  • token - The token to get the documents for.
  • root - An optional node at which to start.

Retrieve the documents for a node for the given token.

By default this function starts at the root of the current store, however it can start at any node of any token store if required.

Source

lunr.TokenStore.prototype.get = function (token, root) {
  return this.getNode(token, root).docs || {}
}

remove

lunr.TokenStore.prototype.remove()

method

Params

  • token - The token to get the documents for.
  • ref - The ref of the document to remove from this token.
  • root - An optional node at which to start.

Remove the document identified by ref from the token in the store.

By default this function starts at the root of the current store, however it can start at any node of any token store if required.

Source

lunr.TokenStore.prototype.remove = function (token, ref, root) {
  var root = root || this.root,
      key = token[0],
      rest = token.slice(1)

  if (!(key in root)) return

  if (rest.length === 0) {
    delete root[key].docs[ref]
  } else {
    return this.remove(rest, ref, root[key])
  }
}

expand

lunr.TokenStore.prototype.expand()

method

Params

  • token - The token to expand.

Find all the possible suffixes of the passed token using tokens currently in the store.

Source

lunr.TokenStore.prototype.expand = function (token, memo) {
  var root = this.getNode(token),
      docs = root.docs || {},
      memo = memo || []

  if (Object.keys(docs).length) memo.push(token)

  Object.keys(root)
    .forEach(function (key) {
      if (key === 'docs') return

      memo.concat(this.expand(token + key, memo))
    }, this)

  return memo
}

toJSON

lunr.TokenStore.prototype.toJSON()

method

Returns a representation of the token store ready for serialisation.

Source

lunr.TokenStore.prototype.toJSON = function () {
  return {
    root: this.root,
    length: this.length
  }
}