nats.js: Very slow message sending in 2.x.x (70 msg/sec instead of 30k msg/sec)

Hello,

I’ve found a very big performance issue with the nats lib version 2.x.x (comparing to version 1.x.x). Maybe related: https://github.com/nats-io/nats.js/issues/438

In my library it’s critical because we use the NATS as a transporter for microservices. And with version 1.x.x the NATS was the fastest message broker (besides Redis, RabbitMQ, KAfka, MQTT…), it could reach 10.000 msg/sec. But with versions 2.x.x the latency is brutally high, and we can reach only 30 msg/sec!

I’ve created an isolated repro code here: https://github.com/icebob/nats-perf-test This test is not a throughput test, it’s a latency test, that sends the messages one-by-one because similar to how microservices call each other in a chain.

I debugged the nats lib to see where is the problem and I found it in the protocol.js.

In the version 2.0.0 the sendCommand collect the command and wait for the next event loop tick before send out to the server.

  sendCommand(cmd: string | Uint8Array, ...payloads: Uint8Array[]) {
    const len = this.outbound.length();
    let buf: Uint8Array;
    if (typeof cmd === "string") {
      buf = encode(cmd);
    } else {
      buf = cmd as Uint8Array;
    }
    this.outbound.fill(buf, ...payloads);

    if (len === 0) {
      //@ts-ignore: node types timer
      this.flusher = setTimeout(() => {
        this.flushPending();
      });
    } else if (this.outbound.size() >= this.pendingLimit) {
      // if we have a flusher, clear it - otherwise in a bench
      // type scenario where the main loop is dominated by a publisher
      // we create many timers.
      if (this.flusher) {
        //@ts-ignore: node types timer
        clearTimeout(this.flusher);
        this.flusher = null;
      }
      this.flushPending();
    }
  }

Here are the msg/sec values with the original code:

d:\Work\tmp\nats-perf-test>node latency.js
Connected. Start...
70 req/s
72 req/s
71 req/s
72 req/s
70 req/s
70 req/s
69 req/s
72 req/s
86 req/s
74 req/s

To give it a try, I modified this code, skipping the queuing and call the flushPending after every command

    sendCommand(cmd, ...payloads) {
        const len = this.outbound.length();
        let buf;
        if (typeof cmd === "string") {
            buf = (0, encoders_1.encode)(cmd);
        }
        else {
            buf = cmd;
        }
        this.outbound.fill(buf, ...payloads);
+       this.flushPending();
-       if (len === 0) {
-           //@ts-ignore: node types timer
-           this.flusher = setTimeout(() => {
-               this.flushPending();
-           });
-       }
-       else if (this.outbound.size() >= this.pendingLimit) {
-           // if we have a flusher, clear it - otherwise in a bench
-           // type scenario where the main loop is dominated by a publisher
-           // we create many timers.
-           if (this.flusher) {
-               //@ts-ignore: node types timer
-               clearTimeout(this.flusher);
-               this.flusher = null;
-           }
-           this.flushPending();
-       }
    }

And the result is 30.000 msg/sec, it 500x faster.

d:\Work\tmp\nats-perf-test>node latency.js
Connected. Start...
28 327 req/s
30 322 req/s
30 784 req/s
30 120 req/s
30 687 req/s
31 142 req/s
31 386 req/s
30 323 req/s
30 786 req/s
30 632 req/s
30 492 req/s
30 895 req/s
30 698 req/s
30 794 req/s
30 704 req/s
30 755 req/s
30 987 req/s
30 567 req/s
30 756 req/s
30 894 req/s
30 322 req/s

Conclusion

The current sending logic is very very slow and it makes NATS unusable in my project because any other alternative message broker is much faster. If you insist on the current queue-based sending logic, I suggest adding a new disableSendQueue option to the NATS JS client to disable the queue logic and send the message immediately.

Version info

NATS server version: 2.9.21

[17056] 2023/08/09 17:47:17.691879 [INF] Starting nats-server
[17056] 2023/08/09 17:47:17.692931 [INF]   Version:  2.9.21
[17056] 2023/08/09 17:47:17.692931 [INF]   Git:      [b2e7725]
[17056] 2023/08/09 17:47:17.693459 [INF]   Name:     NDDLAYHA256ZGYEFYEIMV6O5AOMU56CPFERN37VVFTMLP5XZYZJUPL3D
[17056] 2023/08/09 17:47:17.693459 [INF]   ID:       NDDLAYHA256ZGYEFYEIMV6O5AOMU56CPFERN37VVFTMLP5XZYZJUPL3D
[17056] 2023/08/09 17:47:17.695019 [INF] Listening for client connections on 0.0.0.0:4222
[17056] 2023/08/09 17:47:17.706678 [INF] Server is ready
  • NATS lib version: 2.5.1
  • Node.js: 18.16.0
  • OS: Windows 10

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 6
  • Comments: 17 (12 by maintainers)

Most upvoted comments

For the record the issue is that we the library migrated to 2.0.0 with the need to support different JavaScript runtimes, the call to setImmediate() was replaced with a setTimeout() as setImmediate() is not available in the other runtimes.

@aricart I’ve tested it and now it’s fast again. Good job, thank you for the quick fix.

possibly today.

@icebob wow node performance for the next event loop is certainly not on the same ball park:

/p/t/n [SIGINT]> node test.js Connected. Start… 815 req/s 822 req/s 847 req/s 852 req/s 862 req/s 859 req/s 867 req/s 861 req/s 856 req/s ^C⏎
/p/t/n [SIGINT]> node test.js Connected. Start… 14,834 req/s 15,885 req/s 15,850 req/s 16,119 req/s 16,042 req/s 15,869 req/s 16,017 req/s

(I am trying in Deno, but expect node somewhat similar, let me make it available in node)

@icebob I am getting some difference with my change, but my initial performance is way higher than yours:

~/D/c/s/g/n/nats.deno (flush) [SIGINT]> deno run -A debug/flush.ts (buffered) Connected. Start… 6,907 req/s 7,301 req/s 7,084 req/s 6,932 req/s 7,670 req/s 7,436 req/s 7,306 req/s 7,361 req/s 7,343 req/s ^C⏎ ~/D/c/s/g/n/nats.deno (flush) [SIGINT]> deno run -A debug/flush.ts (direct) Connected. Start… 8,432 req/s 9,919 req/s 9,925 req/s 9,627 req/s 9,917 req/s 9,978 req/s 10,067 req/s

The above is not the right fix. But will sort you out next week. The correct strategy for Javascript is to call flush. But current ty that executes on the next event loop.