nats-server: nats-server memory leak

Have a cluster of 3 nodes, nats-server version 2.8.4 OS release CentOS Stream 8 nats-server.conf running on systemd simplify without any other options ExecStart=/opt/nats-server/nats-server -c /opt/nats-server/nats-server.conf

port: 4222
http_port: 8222
max_payload: 8Mb
tls {
  cert_file: "/etc/pki/tls/private/mqtt.server.pem"
  key_file: "/etc/pki/tls/private/mqtt.server.key"
  ca_file: "/etc/pki/tls/private/mqtt.ca.pem"
  timeout: 5.0
  verify: true
}

cluster {
  name: nats-tvm-test-cluster
  listen: 0.0.0.0:4244
  tls {
    cert_file: "/etc/pki/tls/private/mqtt.server.pem"
    key_file: "/etc/pki/tls/private/mqtt.server.key"
    ca_file: "/etc/pki/tls/private/mqtt.ca.pem"
    timeout: 5.0
    verify: true
  }
  routes = [
    nats-route://prod-test-02.tld.org:4244
    nats-route://prod-test-03.tld.org:4244  ]
}

server_name: prod-test-01.tld.org
jetstream {
  max_memory_store: 3029M
  store_dir: /var/lib/nats
}
mqtt {
  listen: 0.0.0.0:8883
  tls {
    cert_file: "/etc/pki/tls/private/mqtt.server.pem"
    key_file: "/etc/pki/tls/private/mqtt.server.key"
    ca_file: "/etc/pki/tls/private/mqtt.ca.pem"
    timeout: 5.0
  }
  authorization {
    username: "mqtt-user"
    password: "mqtt-password"
  }
}

after successfully start journalctl shows

[547417] 2022/09/22 07:13:13.356609 [INF] Starting nats-server
[547417] 2022/09/22 07:13:13.356840 [INF]   Version:  2.8.4
[547417] 2022/09/22 07:13:13.356845 [INF]   Git:      [66524ed]
[547417] 2022/09/22 07:13:13.356849 [INF]   Cluster:  nats-tvm-test-cluster
[547417] 2022/09/22 07:13:13.356853 [INF]   Name:     prod-test-01.tld.org
[547417] 2022/09/22 07:13:13.356859 [INF]   Node:     jpU2zvJY
[547417] 2022/09/22 07:13:13.356863 [INF]   ID:       NC4OVAF4ASV5AU332DSBSQIRSKLRCIX7KAAKEYFF3JWY7ATC7RHO6PJW
[547417] 2022/09/22 07:13:13.356880 [INF] Using configuration file: /opt/nats-server/nats-server.conf
[547417] 2022/09/22 07:13:13.357534 [INF] Starting http monitor on 0.0.0.0:8222
[547417] 2022/09/22 07:13:13.357575 [INF] Starting JetStream
[547417] 2022/09/22 07:13:13.358210 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[547417] 2022/09/22 07:13:13.358218 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[547417] 2022/09/22 07:13:13.358223 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[547417] 2022/09/22 07:13:13.358226 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[547417] 2022/09/22 07:13:13.358229 [INF]
[547417] 2022/09/22 07:13:13.358232 [INF]          https://docs.nats.io/jetstream
[547417] 2022/09/22 07:13:13.358235 [INF]
[547417] 2022/09/22 07:13:13.358238 [INF] ---------------- JETSTREAM ----------------
[547417] 2022/09/22 07:13:13.358245 [INF]   Max Memory:      2.82 GB
[547417] 2022/09/22 07:13:13.358249 [INF]   Max Storage:     40.00 GB
[547417] 2022/09/22 07:13:13.358253 [INF]   Store Directory: "/var/lib/nats/jetstream"
[547417] 2022/09/22 07:13:13.358256 [INF] -------------------------------------------
[547417] 2022/09/22 07:13:13.363614 [INF]   Restored 0 messages for stream '$G > $MQTT_msgs'
[547417] 2022/09/22 07:13:18.009932 [INF]   Restored 396,633 messages for stream '$G > $MQTT_rmsgs'
[547417] 2022/09/22 07:13:20.160332 [INF]   Restored 41,278 messages for stream '$G > $MQTT_sess'
[547417] 2022/09/22 07:13:20.162085 [INF]   Recovering 3 consumers for stream - '$G > $MQTT_rmsgs'
[547417] 2022/09/22 07:13:20.167607 [INF] Starting JetStream cluster
[547417] 2022/09/22 07:13:20.167688 [INF] Creating JetStream metadata controller
[547417] 2022/09/22 07:13:20.170821 [INF] JetStream cluster recovering state
[547417] 2022/09/22 07:13:20.173535 [INF] Listening for MQTT clients on tls://0.0.0.0:8883
[547417] 2022/09/22 07:13:20.173727 [INF] Listening for client connections on 0.0.0.0:4222
[547417] 2022/09/22 07:13:20.173742 [INF] TLS required for client connections
[547417] 2022/09/22 07:13:20.174196 [INF] Server is ready

but after some time RES physical memory hits more than 2.82G

   PID     USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 559236 daemon    20   0 6412824   3.9g  10044 S  11.8  53.1   7:10.72 nats-server
 559236 daemon    20   0 6412824   3.9g  10044 S   9.3  53.1   7:11.00 nats-server
 559236 daemon    20   0 6412824   3.9g  10044 S   7.9  53.1   7:11.24 nats-server
 559236 daemon    20   0 6412824   3.9g  10044 S   8.6  53.1   7:11.50 nats-server
 559236 daemon    20   0 6412824   3.9g  10044 S   9.3  53.1   7:11.78 nats-server

time to time reaches and 5.0G

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 35 (19 by maintainers)

Most upvoted comments

FYI we are making good internal progress on making streams and KVs with lots of interior deletes more memory efficient. We are seeing a 100x improvement in early tests.

These improvements will most likely hit nightly builds at some point next week and be part of the 2.10 release.

Thanks for your patience.

Again, from the data you provided, the memory usage was due to:

  • Number of retained messages that have been deleted
  • Number of sessions that have been deleted

It’s not that you are doing things wrong, it’s just that the current implementation of the MQTT layer on top of JetStream is having some limitations when it comes to how the server handles deleted messages within a stream, what we call interior deletes (meaning when it is not as simple as moving forward the first sequence in a given stream’s message block). We are trying to find better ways to handle that.

@eliteaz This is what I was thinking, between the session stream and retained messages stream, there are a LOT of interior deletes, which we are not really optimized for at the moment. There is no quick fix, but the team needs to think of ways to make this scenario work better. We will keep you posted. Thanks!

Yes, sure. I will try to help

nats stream info output

[root@tvm-prod-xmpp01 nats-server]# nats s info "\$MQTT_sess"
Information for Stream $MQTT_sess created 2022-10-17 05:56:26

             Subjects: $MQTT.sess.>
             Replicas: 3
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: 1
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


Cluster Information:

                 Name: nats-tvm-prod-cluster
               Leader: tvm-prod-xmpp02.fqdn
              Replica: tvm-prod-xmpp01.fqdn, current, seen 0.09s ago
              Replica: tvm-prod-xmpp03.fqdn, current, seen 0.09s ago

State:

             Messages: 4,461,919
                Bytes: 895 MiB
             FirstSeq: 278 @ 2022-10-17T05:56:28 UTC
              LastSeq: 9,278,896 @ 2022-11-28T08:20:20 UTC
     Deleted Messages: 4,816,700
     Active Consumers: 0
   Number of Subjects: 4,461,919
[root@tvm-prod-xmpp01 nats-server]# nats s info "\$MQTT_rmsgs"
Information for Stream $MQTT_rmsgs created 2022-10-17 05:56:26

             Subjects: $MQTT.rmsgs
             Replicas: 3
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


Cluster Information:

                 Name: nats-tvm-prod-cluster
               Leader: tvm-prod-xmpp02.fqdn
              Replica: tvm-prod-xmpp01.fqdn, current, seen 1.14s ago
              Replica: tvm-prod-xmpp03.fqdn, current, seen 1.15s ago

State:

             Messages: 28,207
                Bytes: 6.8 MiB
             FirstSeq: 75,475 @ 2022-10-17T09:59:18 UTC
              LastSeq: 23,268,113 @ 2022-11-28T08:20:39 UTC
     Deleted Messages: 23,164,432
     Active Consumers: 3
   Number of Subjects: 1
[root@tvm-prod-xmpp01 nats-server]# nats s info "\$MQTT_msgs"
Information for Stream $MQTT_msgs created 2022-10-17 05:56:26

             Subjects: $MQTT.msgs.>
             Replicas: 3
              Storage: File

Options:

            Retention: Interest
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


Cluster Information:

                 Name: nats-tvm-prod-cluster
               Leader: tvm-prod-xmpp03.fqdn
              Replica: tvm-prod-xmpp01.fqdn, current, seen 0.76s ago
              Replica: tvm-prod-xmpp02.fqdn, current, seen 0.76s ago

State:

             Messages: 0
                Bytes: 0 B
             FirstSeq: 1
              LastSeq: 0
     Active Consumers: 0

@kozlovic I have wiped our cluster’s metada from store_dir: /var/lib/nats and started a cluster again before that we have disabled such a session parameter in our andoid boxes https://www.eclipse.org/paho/files/javadoc/org/eclipse/paho/client/mqttv3/MqttConnectOptions.html#setCleanSession-boolean- now we have much more less RAM usage

[root@tvm-prod-xmpp03 ~]# nats server list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                               Server Overview                                                                │
├─────────────────┬───────────────────────┬───────────┬─────────┬─────┬───────┬────────┬────────┬─────┬─────────┬─────┬──────┬───────────┬─────┤
│ Name            │ Cluster               │ IP        │ Version │ JS  │ Conns │ Subs   │ Routes │ GWs │ Mem     │ CPU │ Slow │ Uptime    │ RTT │
├─────────────────┼───────────────────────┼───────────┼─────────┼─────┼───────┼────────┼────────┼─────┼─────────┼─────┼──────┼───────────┼─────┤
│ tvm-prod-xmpp03 │ nats-tvm-prod-cluster │ 0.0.0.0   │ 2.9.2   │ yes │ 1,004 │ 3,563  │ 2      │ 0   │ 430 MiB │ 8.0 │ 5    │ 20h50m1s  │ 3ms │
│ tvm-prod-xmpp01 │ nats-tvm-prod-cluster │ 0.0.0.0   │ 2.9.2   │ yes │ 1,361 │ 3,569  │ 2      │ 0   │ 434 MiB │ 9.0 │ 6    │ 20h50m10s │ 3ms │
│ tvm-prod-xmpp02 │ nats-tvm-prod-cluster │ 0.0.0.0   │ 2.9.2   │ yes │ 934   │ 3,563  │ 2      │ 0   │ 371 MiB │ 9.0 │ 0    │ 20h50m5s  │ 3ms │
├─────────────────┼───────────────────────┼───────────┼─────────┼─────┼───────┼────────┼────────┼─────┼─────────┼─────┼──────┼───────────┼─────┤
│                 │ 1 Clusters            │ 3 Servers │         │ 3   │ 3299  │ 10,695 │        │     │ 1.2 GiB │     │ 11   │           │     │

But interesting is that before wipe’ing store_dir we have had a huge RAM consumption per node (5GB and above) for rather long period (about week) now I see there is much more less sessions files at /var/lib/nats/jetstream/$G/streams/$MQTT_sess So maybe sessions establishment helped us, but really don’t know why after clean store_dir it only started to work properly ?