quick-score: Can't use query long strings

I have a lot of input data and so I’ve been using quick-score to sort by query quickly as opposed to fuse.js and I’ve found that it works great until the string goes over 150 characters till which it hangs and never finishes.

Is there a config method I could set or is this just a bug?

Much appreciated

Dan

My code:

const { QuickScore } = require("quick-score");
const qs = new QuickScore(array, ["title", "url"]); // array is about 500-1000 items long and grows over time
qs.search("test") // about 2-9ms
qs.search("really long test") // never finishes

Possible place why: /src/config.js

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 25 (8 by maintainers)

Most upvoted comments

Just pushed it to npm a few minutes ago. I upped the maxIterations to 2^16, which seemed like a nice round number. So 16-character edge-case queries will still return a correct score after 65K loops, which didn’t seem noticeably slower than the 5K limit.

I haven’t properly documented the config param yet, but if you wanted to change the limit, you could do something like:

new QuickScore(items, { config: { maxIterations: Math.pow(2, 10) }})

Given how the algorithm works, powers of 2 are the most efficient limits.

I’ve got a “fix” on this branch. It just counts all the iterations of the inner loop and bails when it hits 5000 (which can be changed via the config param). Definitely a kludge, but I wasn’t able to come up with anything more elegant. It avoids the slowdown from long near matches, though.

@dan-online I suspect you typed some additional character after the R, as I’m not able to reproduce the issue with just that query, which fully matches those items. It’s not the total number of items that’s a problem (though it can make it slightly worse); it’s the long, nearly-matching query that triggers 2^n loops.

@Slapbox it seems to not be an issue of the recursion depth. It’s more about the number of loops exploding when a long query matches except for the last character. A 5K vs. 50K limit won’t make much difference. It’s when you get in the 50M range that the browser starts locking up.

One disadvantage of limiting the iterations is that it’ll return 0 when there should be a match, like:

string: "a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|zabcdefghijklmnopqrstuvwxy1"
query:  "abcdefghijklmnopqrstuvwxyz"

It’s a bit of an artificial example, though, so it’s still probably better to avoid locking up the UI by returning an incorrect score.

So it looks like in the worst case, a partially-matching query can generate 2 ^ querySubstringMatchLength recursive calls and (2 ^ queryLength) - 1 iterations through the for i loop in quickScore().

As an example, if the string is "abcdefghijklmnopqrstuvwxyz" and the query is "abcdefghijklmnopqrstuvwxy0", it’ll generate 33,554,432 recursive calls and 67,108,863 loops before returning a 0 score.

The tricky thing is that sometimes the way the algorithm breaks down the query to recur on smaller strings is important for catching cases where part of the query is included towards the end of the string, but the whole query is included in a more spread out form. So scoring "abcxyz" against "a|b|c|x|y|zabcxy" will match part of the query against the end of the string, but then it needs to score each character in the query against the string to finally get a match.

Having some fixed recursion limit could help, but I think there’s also a way to reduce the number of calls based on whether all of the remaining query characters exist in the rest of the search string. Need to play around with it some more.

@fwextensions Thanks for all the information and the new version which works great!

@fwextensions thank you so much for your research, hard work, and advice!

Just a note: I ran into this issue because of a real word problem: Users were pasting longer names (complete titles) in the input field, presumably from a pdf. Instead of the expected single result they got a crash which is now temporarily avoided by restricting the character length.