graaljs: Perf of warmed-up ClojureScript unit tests slow on Graal.JS relative to Nashorn
If you run the ClojureScript compiler’s unit tests in a loop and allow things to warm up, they end up running more slowly in Graal.JS than under Nashorn.

The numbers above were produced on a Mac, using GraalVM 1.0.0-rc4 (ee edition). Nashorn is 1.8.0_172-b11.
To reproduce using the actual ClojureScript compiler unit tests, the steps involve:
- Modify the ClojureScript compiler’s tests so that they run in an infinite loop (optionally also capturing timing metrics)
- Build the tests (which involves producing a JavaScript file that can be executed in any engine)
- Run the tests, allowing some time (say 1/2 hour or so) for them to settle
To make things more convenient, I’ve attached a copy of the built tests, and attached them directly in the “Run the tests” section.
Modify the ClojureScript compiler’s tests
The ClojureScript compiler is at https://github.com/clojure/clojurescript
Modifications are at https://github.com/mfikes/clojurescript/commit/1c76cf62b1c69e0a81f3663559215c62fe09ae31
and to make things easier, that branch could just be checked out and used as-is. Roughly, these modifications involve setting up some atoms (mutable state) to track timing information, and putting the tests themselves in an infinite loop ((while true ...) in ClojureScript), doing some accounting for timing, and printing out the timing with the string DELTA which can be grepped for.
Build the tests
To build the tests, run script/bootstrap and then script/test in the tree. This will cause it to build them and then after logging
Applying optimizations :advanced to 343 sources
it will automatically attempt to run them under various JavaScript engines if any are configured per https://clojurescript.org/community/running-tests. (No JavaScript engines need to be configured for this repro). If it starts running the tests in any engine, Ctrl-C them (because they are in an infinite loop).
This will result in a JavaScript file, builds/out-adv/core-advanced-test.js that can be run in any engine.
Run the tests
For convenience, here is a built copy of the test JavaScript file: core-advanced-test.js.gz
Then to run the tests, say, using Graal.JS, do
/path/to/js builds/out-adv/core-advanced-test.js | grep DELTA
You will see output like this
DELTA 3204 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204] 100
DELTA 2083 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204 2083] 165
DELTA 2016 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204 2083 2016] 228
DELTA 1932 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204 2083 2016 1932] 288
The first number is the number of milliseconds to run that test loop, the queue is the last 32 timings, and the very last number is the average of the 32 timings. (So, definitely let it fill the queue with timings, and go well beyond that to let things settle completely.)
(If you run them in Node, the tests want print to be defined, so to do that, start node then var print = console.log; and then require("./path/to/core-advanced-test.js");
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 6
- Comments: 18 (6 by maintainers)
Hi Mike,
short status update:
We’ve identified and fixed several minor, yet relevant performance issues on that benchmark. They are in tip of this repo (a few more are planned in will be merged in the next week). Some changes landed in RC5, that is about to be published, but most didn’t make the feature freeze and will only show up in RC6 a month later.
The current state (tip of the repo as of today):
Node 8.11.1:
(not getting any faster after that)
Graal.js EE:
(that’s the 100th iteration, the 100th time “DELTA” is print).
Warmup is also improved, that’s after the 31st iteration:
A measurement in the area of the peak performance is reached around 40 iterations. Values are a bit flaky, on my Laptop, should be using a dedicated benchmark machine.
nashorn 1.8.0_172
100th iteration:
warmup (31st iteration):
slower first iteration, faster on iterations #2 and #3, from them on Graal.js is faster.
We’ll have a version of this benchmark now on our server, will continue to monitor performance there.
– Christian
@thomaswue For these tests, Graal.JS is currently within the golden ratio of JavaScriptCore perf, so it seems within reach. 👍
Hi @mfikes
thanks for the update, we’ll investigate.
Best, Christian
@woess Yeah, as a separate concern, the tests should pass. In ClojureScript’s CI we currently grab graalvm-ce-1.0.0-rc1 and run the tests. They currently all pass.
@woess Sorry, adding to the above, I missed that you said after the first iteration. I suspect that this is OK: The tests probably change some state and then fail when that state is dirty.
Here is a chart comparing Nashorn and Graal.JS master warmup curves. x-axis is iteration and y-axis is the number of milliseconds needed to run the ClojureScript unit tests.
Wow! If you assume that Node 8.11.1’s performance is equivalent to the V8 number I had obtained, and extrapolating, you’d get this updated chart with Graal.JS matching SpiderMonkey’s perf.
For reference, here are the original numbers behind the chart that I had initially attached to this issue:
And based on @wirthi 's report above, extrapolating the new Graal.JS perf on tip puts it at 1147 ms. (You get a similar estimate of 1160 ms if you come at it using the Nashorn numbers as a baseline.)
Here is an updated chart showing perf with RC8:
Underlying data:
@mfikes I’ve done some further benchmarks and figured we have quite a performance difference between the JVM and our own SubstrateVM. When you run with the
jscommand, the code is executed by a native image that is ahead-of-time compiled via SubstrateVM. This is great for embedding but has different performance characteristics: startup is much faster, warmup might be better, peak performance might be lower or higher depending on the workload.This different characteristics affect your benchmark. I measured with
js --jvmnow - that configuration uses Graal as Java compiler as well, but otherwise runs on a normal JVM, without Substrate VM. Then our results are already better on RC4 that you tested on:You can see the improvement over the release candidates (jvm mode) 2200 => 1824 => 1486, but also that the jvm mode is around 2x faster than the native mode. Our initial investigations suggest that the Garbage Collector is the reason for the native image being slower; SubstrateVM has a less sophisticated one (compared to HotSpot) and the benchmark seems to be heavy on allocating and thus GC.
As Thomas mentioned above, we will keep improving on this benchmark, this just adds another dimension we have to consider.
Hi Mike,
thanks; I’ve also modified the test and can confirm we are ~5-6x slower than Node/v8 even with sufficient warmup. We will look into improving our score on this.
Best, Christian
Updated Chart for the 19.0.0 production release:
Underlying data:
It appears there was an approximately 0.55 speedup relative to RC 12.