jsonata: Jjsonata::evaluate is slow
I changed my code from handwritten loops to jsonata, it looks much cleaner now.
However, it was a big surprise that execution is very slow now. I apply a query with evaluate
~ 6000 times and each call takes ~15ms. This is 90s execution time. It was <10s with the handwritten loops.
My queries all look similar to this one
$[type='command' and $contains(name, /^add_library|add_executable$/) and $not(args[name='ALIAS'])]
the json data size average is 500kb.
I know performance is hard to judge but in this case I tend to go back to tailored hand-written loops. Is there anything I can do?
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 4
- Comments: 16 (3 by maintainers)
Thanks for raising this. Performance is an area that hasn’t received a great deal of attention up to now, but I’m sure that it is something we can do much better on if it gets focus. Without access to your test data, it’s hard for me to make specific recommendations. It may be that the regex is the bottleneck, in which case you could try putting that as the last clause in the predicate (i.e.
$[type='command' and $not(args[name='ALIAS']) and $contains(name, /^add_library|add_executable$/)]
).If you’re willing to donate some test cases with some means of measuring their performance, then I’ll start an activity focussed around performance.
@darrencruse thanks for offering to help out with this. I agree with @janvda that we need a set of realistic performance test cases that serve as benchmarks to work with. Firstly, I’ll try and answer some of your questions:
Yes, the evaluator is driven by the AST and implements the recursive Eval/Apply cycle described in Structure and Interpretation of Computer Programs. The core JSONata evaluator is essentially a Lisp machine, and this book (specifically chapter 4) tells you how to build one. The evaluate() function delegates to other functions depending on the type of expression, which will in turn recursively invoke evaluate() to process sub-expressions.
The problem with this recursive evaluator was that is was not very ‘node.js’ friendly. The top level evaluate() function would get called and would block until the whole expression (including sub-expressions) was fully evaluated. This was fine with simple expressions, but as the language and expression complexity grew, this didn’t play nicely in server side node applications. The easiest way I could fix this, without throwing it all away and rewriting, was to make all of the functions that participated in the recursive cycle into function generators. This meant that the evaluator could be ‘stepped’ without blocking the node event loop, and without me having to rewrite it all. Unfortunately, this is where performance took a nose dive. You can probably see this if you go back to v1.1 which performs much better.
I took some time to investigate a way of improving this a couple of years ago. One obvious approach is to replace all the recursive generators with an iterative approach (i.e. a while loop). There are techniques for doing this but they are not trivial and I can’t be certain that the performance will be substantially better as a result. I also did some work on improving the current scheme by enhancing the parser to spot where expressions did not contain any sub-expressions, and mark that in the AST. Then in the evaluator, I created non-recursive versions of the evaluateXxx() functions that didn’t need to be generators in order to process them. This definitely gave performance improvements for simple expressions. This code all resides in a stash somewhere - I didn’t get to finish it, but I can dust it off and push up a branch if anyone has the time to look at it.
I think that’s fair to say. At least not without significant effort (and I only do this in my spare time now). To a certain extent, there will always be a ‘cost of abstraction’ associated with these more declarative type languages compared to an imperative language where the programmer is specifying the ‘how’ rather than ‘what’. I’d love to write a proper compiler though, or perhaps a wasm implementation, but I really don’t want to get your hopes up that I will get to that any time soon 😉.
Not sure if any of that helps, but I am keen to work with anyone who might have some time to invest in this space.
Thanks!
Typically an application is spending 80% of time in only 20% of the code. So I would still use JSONATA everywhere where it functionally makes sense to use. Then check performance. If performance is bad then identify those JSONATA queries that are most responsible for this bad performance and only rewrite those JSONATA queries in javascript.
Following up re my post above which - apologies - may not have been esp. helpful and - in retrospect - I didn’t really say what I should have which is:
a. As we’ve started using jsonata more we have hit some cases where jsonata is so slow it’s an issue for us (in a few cases we have started rewriting things in plain javascript because of jsonata’s slowness).
b. Considering jsonata is open source I would be happy to try and work on the performance, though I’m afraid I might not have the goods 😃 i.e. I respect that it’s likely not an easy problem (or there would have been more progress already right 😃.
My questions:
Broadly I understand the “jsonata()” function is parsing the provided expression into an AST, and “evaluate()” is walking that AST so broadly jsonata is an “interpreter” correct.
Has there been any analysis done to look at where the biggest bottlenecks during “evaluate” reside? Or do you already know/have intuitions about where the bottlenecks reside?
Are there already thoughts about low hanging fruit for improving performance? e.g. I’m guessing maybe some kind of caching of certain lookups that are possibly happening repeatedly in loops today might be such?
Is it fair to assume that jsonata could never hope to match plain javascript performance just because the JIT compilation v8 magic stuff that effectively converts hotspots in plain javascript to machine code is hampered from achieving top speeds by jsonata’s working as an interpreter. But I’m also guessing (is this right?) the task of changing JSONata to generate code like a transpiler would be a huge (impossible?) undertaking? That isn’t being considered is it?
Is this still true of performance in the latest version? Has there been any development on this issue?
I’d also like to put some attention on performance. I have created a simple test case:
https://jsperf.com/jsonata-performance
which I already tried to optimize a bit in the sense of only looking at the
evaluate
rather than theparser
.To play with that locally, try the following snippet:
which on my Macbook Pro yields something like:
To be a bit more fair, I also created a slightly more complex test case because I feared JS might tend to optimize the above away into a no-op:
which produces:
I understand that writing native code might be always of advantage, however my gut-feeling is that it should be somehow possible to close the gap (which is currently something much bigger than factor 100) a bit.