graaljs: GraalJS is around 70X slower than NodeJS 14

Hello, first of all, I don’t want to see that is bashing GraalVM/JS and this experiment is not wanting to look I am not appreciating the project as is, I know that it is really a hard work to do a full JS/NodeJS runtime.

So let’s dig in: I want to parse large JS files using AcornJS and to try to make a visitor that would replace some construct in that file.

So imagine (or get some generated webpack) large file and parse it as follows:

const fs = require('fs')
const {Parser} = require('acorn')

const walk = require("acorn-walk")

function timeIt(message, action){
    var start = new Date();
    action();
    var end = new Date();
    console.log("Time to "+message, (end-start))
}


function deduplicateTypes(fullAst){
    var map = new Map()
    var keys = []
    var countHits = 0
    walk.fullAncestor(fullAst, (node) => {
        ++countHits;
        if (map.has(node.type)){
            node.type = keys[map.get(node.type)]
            return;
        }
        map.set(node.type, keys.length)
        keys.push(node.type)
    });
    console.log(countHits)
    return keys
}
const content = fs.readFileSync('/path/to/large/js/file', 'utf-8')
for(var i =0;i<10; i++){
    timeIt(`Parse ${i}`, ()=> {
        var tree = Parser.parse(content)
        deduplicateTypes(tree)
    });
}

With package.json file:

{
  "name": "safejs",
  "version": "0.1.0",
  "description": "Prototype JS Replacer",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "acorn": "^6.4.2",
    "acorn-loose": "^8.0.0",
    "acorn-walk": "^8.0.0",
    "htmlparser": "^1.7.7"
  }
}

Instead of “deduplicateTypes” it would be a small checker of type and replacing some constructs (i.e. strings with the some encrypted string routine).

This code using NodeJS would run in around 1300 ms, when with GraalJS runs in around 105 000 ms, and I was iterating the program to wait GraalVM to get it to let’s say 10 seconds and at least in up-to 20 iterations it didn’t get under 105 seconds. (first iteration though was 141 519 ms). The input file which sadly I cannot share it, is into 12 MB generated JS code, and it is a ReactNative application.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 40 (24 by maintainers)

Most upvoted comments

GraalJS supports string views now, which should fix this issue. Benchmark results on my machine:

    Start parse JS...
    Done in (seconds): 8.75
    Start visit tree...
    Done in (seconds): 1.761
Done in (seconds): 10.582
Start Parse 1...
    Start parse JS...
    Done in (seconds): 13.161
    Start visit tree...
    Done in (seconds): 5.579
Done in (seconds): 18.741
Start Parse 2...
    Start parse JS...
    Done in (seconds): 8.275
    Start visit tree...
    Done in (seconds): 4.284
Done in (seconds): 12.56
Start Parse 3...
    Start parse JS...
    Done in (seconds): 3.178
    Start visit tree...
    Done in (seconds): 1.682
Done in (seconds): 4.862
Start Parse 4...
    Start parse JS...
    Done in (seconds): 2.193
    Start visit tree...
    Done in (seconds): 1.202
Done in (seconds): 3.397
Start Parse 5...
    Start parse JS...
    Done in (seconds): 1.699
    Start visit tree...
    Done in (seconds): 0.966
Done in (seconds): 2.666
Start Parse 6...
    Start parse JS...
    Done in (seconds): 1.397
    Start visit tree...
    Done in (seconds): 0.892
Done in (seconds): 2.29
Start Parse 7...
    Start parse JS...
    Done in (seconds): 1.299
    Start visit tree...
    Done in (seconds): 0.828
Done in (seconds): 2.127
Start Parse 8...
    Start parse JS...
    Done in (seconds): 1.401
    Start visit tree...
    Done in (seconds): 0.833
Done in (seconds): 2.236
Start Parse 9...
    Start parse JS...
    Done in (seconds): 1.319
    Start visit tree...
    Done in (seconds): 0.782
Done in (seconds): 2.104

I had a look at why we are so slow here and it turns out we spend most of the time in a single string.slice! This is the culprit: https://github.com/acornjs/acorn/blob/6.4.2/acorn/src/parseutil.js#L15

var literal = /^(?:'((?:\\.|[^'\\])*?)'|"((?:\\.|[^"\\])*?)")/;
pp.strictDirective = function(start) {
  for (;;) {
    // Try to find string literal.
    skipWhiteSpace.lastIndex = start;
    start += skipWhiteSpace.exec(this.input)[0].length;
    var match = literal.exec(this.input.slice(start)); // <======== copies everything from start to input.length
    if (!match) { return false }
    if ((match[1] || match[2]) === "use strict") { return true }
    start += match[0].length;

    // Skip semicolon, if any.
    skipWhiteSpace.lastIndex = start;
    start += skipWhiteSpace.exec(this.input)[0].length;
    if (this.input[start] === ";")
      { start++; }
  }
};

Since it is called very often and our string.slice implementation always creates a copy of the substring, this can get very, very slow with large input files (like main.js which is >11MB). If we change this line to avoid excessive copying, Graal.js gets an order of magnitude faster. It’s a known issue. Other JS engines use string views to avoid copying substrings, and some JS code relies on this to be fast. We have plans to enable this optimization in GraalVM as well in the future (currently blocked by other string-related work).

Hi @ciplogic,

thanks for your request. And thanks for providing a simple to run example, that is much appreciated as it makes my job much easier.

Yes, our warmup performance can be very bad, and also peak performance might not be where we want it to be on some examples. Code patterns like yours will help us improve in the future. Warmup performance is our main focus at the moment, we should significantly improve over the next few versions.

Looking at your example, however, I cannot reproduce the current stage as you do. Yes, we are behind, but not as catastrophically as in your data. Can you please check whether you are using a current version of GraalVM, (20.2.0 being the most recent currently)? Maybe you can also share some basic specs of your machine, like how many cores, how much memory, do you use CE or EE version of GraalVM, etc.

On my machine, I get the following data (warmup generally stabilized after ~5 iterations, that what I show here; I am using Node 12.x because that is what GraalVM provides as well. I also tried with Node 14.x but the results were comparable):

Node.js 12.18.0:

$ ~/software/node-v12.18.0-linux-x64/bin/node index.js
1177427
Time to Parse 0 1901
1177427
Time to Parse 1 1721
1177427
Time to Parse 2 1660
1177427
Time to Parse 3 1695
1177427
Time to Parse 4 1720

GraalVM EE 20.2.0:

$ ~/software/graalvm-ee-java8-20.2.0/bin/node index.js 
1177427
Time to Parse 0 37332
1177427
Time to Parse 1 39480
1177427
Time to Parse 2 11947
1177427
Time to Parse 3 11730
1177427
Time to Parse 4 11315

GraalVM CE 20.2.0:

$ ~/software/graalvm-ce-java8-20.2.0/bin/node index.js 
1177427
Time to Parse 0 73193
1177427
Time to Parse 1 55765
1177427
Time to Parse 2 20624
1177427
Time to Parse 3 16998
1177427
Time to Parse 4 13091

(Using GraalVM EE data unless mentioned otherwise) So while our first iteration is 37s compared to 1.9 seconds, fourth iteration is 1.695 ms VS 11.730 ms - that’s a factor of 7.3. Totally agree, we need to improve both warmup AND peak performance on this example, but I still wonder why your data is so much worse? Noteworthy, CE data is worse for the first ~2 iterations, but on peak, it’s 11 compared to 13 seconds, so not a huge difference.

A surprise contender might be the --jvm mode of GraalVM (that means, that the JavaScript engine is not using the ahead-of-time-compiled native mode generated by native-image but instead is executed on a plain JVM):

$ ~/software/graalvm-ee-java8-20.2.0/bin/node --jvm index.js 
1177427
Time to Parse 0 55094
1177427
Time to Parse 1 66969
1177427
Time to Parse 2 23395
1177427
Time to Parse 3 6445
1177427
Time to Parse 4 6688
1177427
Time to Parse 5 6127
1177427
Time to Parse 6 5944
1177427
Time to Parse 7 5740
1177427
Time to Parse 8 5452
1177427
Time to Parse 9 5397
1177427
Time to Parse 10 4394
1177427
Time to Parse 11 4300
1177427
Time to Parse 12 4270
1177427
Time to Parse 13 4179
1177427
Time to Parse 14 4426
1177427
Time to Parse 15 4122

While the warmup takes longer than in --native mode on EE (both in terms of time and of number of iterations), the peak performance is significantly better: with 1660 compared to 4122 ms, we are down to a factor of 2.5.

Best, Christian

the latency Mode is default now?

Not quite. The default mode is tiered compilation now, which tries to balance latency and throughput. In most cases, it should offer warmup close to latency mode without sacrificing peak performance. So we generally don’t recommend it anymore, although it might still make sense for resource-constrained environments and certain workloads.

@frank-dspeed here, it is the fastest:

/home/ciprian/apps/graalvm-ce-java11-20.2.0/bin/node --jvm --engine.Mode=latency /home/ciprian/WebstormProjects/untitled/index.js
Time to Parse 0 36469
Time to Parse 1 41001
Time to Parse 2 36078
Time to Parse 3 37431
Time to Parse 4 36812
Time to Parse 5 33830
Time to Parse 6 38036
Time to Parse 7 38100
Time to Parse 8 37899
Time to Parse 9 37963

@frank-dspeed: ciplogic probably meant 1 year of our engineering time to improve warming up, not 1 year of actually warming up 😃

@ciplogic i think you understand that a bit wrong https://www.techempower.com/benchmarks/#section=data-r19&hw=ph&test=composite when you look here for es4x that is in fact graaljs it ranks on overall rank 9 and nestjs which is in efect optimized nodejs ranks 49

simply wait a bit and see what happens i am confident you will be impressed. else i would not waist my time with that.

Conclusion

As soon as there is no nodejs dependencie needed for graaljs it is up to 13x faster on some workloads as it is overall more memory and cpu efficent. And we did not talk about optimization we talk about the out of the box case.

When we account optimization into that so for example tune the jvm inlining parameters we can end up in hugh bigger numbers !! The JVM has 100+ Options for Compiler tuning and is supporting Custom GC Implementations. All this will never be possible with NodeJS that is a fact. There a boundaries because of the way NodeJS is Designed and this Physical boundaries do not apply to GraalJS.

hope that could sheet some light.

@cip they are aware of that facts and it will get handeled soon.

Many People are working on that as rule of Thumb that applys at present you can always say when ever you use node-graal and u use Nativ Node Modules you will get that slower results it is also importent that you set the engine to lazy and maybe even put a warmup function into your code.

Revisit that in 6 Month and you will get much better results i have written down your name and will ping you when that point is reached 😃

But i want to give you a current example that proves that graaljs is faster in general google for es4x and try it that is using Java + JS and less node modules and outperforms nodejs by 10x times in most scenarios of the empower tech benchmark

Yes, and the comment just above mine clarified that this is available now. I just mentioned which GraalVM this is in.

I just wanted to mention it for the sake of completeness for anyone interested. I’m not sure they would accept my change, which relies on an ES6 feature, and it does not benefit most users. Acorn is likely not the only popular package using such code patterns, so we’ll have to do something about it in GraalVM anyway.

So, to understand correctly, if you have an open issue for it, I can close this one with referencing the original issue, if not, I think that this bug should be open up to the point the string view implementation is provided.

Though, this is a great find and it is great that you are both aware of the issue and of the solution @woess !

Thank you for the hard work 💐

@frank-dspeed regarding your question:

why is there no transition possible between latency and throughput i mean can it not do additional optimization ? i am wondering what is blocking that?

GraalVM 20.1 will have tiered compilation enabled by default which should provide faster warmup at the same peak performance, bridging the gap between throughput and latency modes. So there should be no need for --engine.Mode=latency anymore. It’s still far from perfect but we’re working on tuning the compilation policy, and of course, improving JS performance in general.

@frank-dspeed this is a huge tradeoff game. We have to trade off compilation speed, compilation memory consumption, peak performance, interpreter performance, compiled code size, how speculative the code is (how often we have to throw it away), memory consumption of the result, security concerns, privacy concerns, and many more into this.

That, while we only have heuristics about all those values (how to guess all those values for a guest language like JavaScript, that the compiler (GraalVM, Java) does not even understand)?

We are doing our best here, and 60 years of compiler research, 20+ years of Java compiler research go into this. Our current focal point is warmup performance.