lunr.js: Searching words with ending wildcards returns inconsistent results

For example, on https://olivernn.github.io/moonwalkers/, both pilot and pilot* will return the results I expect. However, module* returns nothing (while module works as I expect).

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 20 (10 by maintainers)

Most upvoted comments

@larskendall without seeing how you set up your index its difficult to say for sure, but that looks like the result of stemming. “critical” stems to “critic” you can test this out with the following snippet:

idx.pipeline.runString("critical")

Further up in this thread there are a couple of suggestions for ways to express searches that will lead to the kind of results it seems you are expecting. An alternative is to disable the stemmer at build time and search time:

var idx = lunr(function() {
  //...snip...
  this.pipeline.remove(lunr.stemmer)
  this.searchPipeline.remove(lunr.stemmer)
  //...snip...
  // add documents here
})

I’ve pushed a change that should fix the “duplicate index” error, please try version 2.0.3 and let me know if there are any issues.

idx.search(${queryTerm}^100 ${queryTerm}*^10 ${queryTerm}~2)

Note I believe ^ may cause issues if queryTerm is multiple words as only the last word will get increased priority, does that sound right?

@et1421 I’m going to close this issue now, if you can provide more details on the results you were seeing (specifically being able to provide the index) then please re-open this issue and I’ll take a further look.

What I noticed from peoples use case of lunr is that, for typeahead style search, the automatic wildcard could give nice results, as shown on your site. However it would frequently cause unexpected results, just take a look through some of the closed issues.

I was thinking about what the best way to express a query for typeahead search might be, there obviously needs to be a component searching for the beginning of a string, but it should also look for exact matches. Perhaps also allow for some fuzzy matching too?

All of the above are possible with lunr. I would advise looking into the lunr.Index#query method, it is intended to be used for building queries programatically (it is used internally by lunr.Index#search).

Below is an example of what I was thinking for typeahead search:

idx.query(function (q) {
  // look for an exact match and apply a large positive boost
  q.term(queryTerm, { usePipeline: true, boost: 100 })

  // look for terms that match the beginning of this queryTerm and apply a medium boost
  q.term(queryTerm + "*", { usePipeline: false, boost: 10 })

  // look for terms that match with an edit distance of 2 and apply a small boost
  q.term(queryTerm, { usePipeline: false, editDistance: 2, boost: 1 })
})

The only slight wrinkle is having to manually append a wildcard to the query term, perhaps this should be an option, e.g. wildcard with the values trailing | leading | wrapped | none, I’ll have a think about it.

You could express this within a query string and the search method like this if you want to try things out:

idx.search("${queryTerm}^100 ${queryTerm}*^10 ${queryTerm}~2")