nats-server: nats-server memory leak
Have a cluster of 3 nodes,
nats-server version 2.8.4
OS release CentOS Stream 8
nats-server.conf running on systemd simplify without any other options
ExecStart=/opt/nats-server/nats-server -c /opt/nats-server/nats-server.conf
port: 4222
http_port: 8222
max_payload: 8Mb
tls {
cert_file: "/etc/pki/tls/private/mqtt.server.pem"
key_file: "/etc/pki/tls/private/mqtt.server.key"
ca_file: "/etc/pki/tls/private/mqtt.ca.pem"
timeout: 5.0
verify: true
}
cluster {
name: nats-tvm-test-cluster
listen: 0.0.0.0:4244
tls {
cert_file: "/etc/pki/tls/private/mqtt.server.pem"
key_file: "/etc/pki/tls/private/mqtt.server.key"
ca_file: "/etc/pki/tls/private/mqtt.ca.pem"
timeout: 5.0
verify: true
}
routes = [
nats-route://prod-test-02.tld.org:4244
nats-route://prod-test-03.tld.org:4244 ]
}
server_name: prod-test-01.tld.org
jetstream {
max_memory_store: 3029M
store_dir: /var/lib/nats
}
mqtt {
listen: 0.0.0.0:8883
tls {
cert_file: "/etc/pki/tls/private/mqtt.server.pem"
key_file: "/etc/pki/tls/private/mqtt.server.key"
ca_file: "/etc/pki/tls/private/mqtt.ca.pem"
timeout: 5.0
}
authorization {
username: "mqtt-user"
password: "mqtt-password"
}
}
after successfully start journalctl shows
[547417] 2022/09/22 07:13:13.356609 [INF] Starting nats-server
[547417] 2022/09/22 07:13:13.356840 [INF] Version: 2.8.4
[547417] 2022/09/22 07:13:13.356845 [INF] Git: [66524ed]
[547417] 2022/09/22 07:13:13.356849 [INF] Cluster: nats-tvm-test-cluster
[547417] 2022/09/22 07:13:13.356853 [INF] Name: prod-test-01.tld.org
[547417] 2022/09/22 07:13:13.356859 [INF] Node: jpU2zvJY
[547417] 2022/09/22 07:13:13.356863 [INF] ID: NC4OVAF4ASV5AU332DSBSQIRSKLRCIX7KAAKEYFF3JWY7ATC7RHO6PJW
[547417] 2022/09/22 07:13:13.356880 [INF] Using configuration file: /opt/nats-server/nats-server.conf
[547417] 2022/09/22 07:13:13.357534 [INF] Starting http monitor on 0.0.0.0:8222
[547417] 2022/09/22 07:13:13.357575 [INF] Starting JetStream
[547417] 2022/09/22 07:13:13.358210 [INF] _ ___ _____ ___ _____ ___ ___ _ __ __
[547417] 2022/09/22 07:13:13.358218 [INF] _ | | __|_ _/ __|_ _| _ \ __| /_\ | \/ |
[547417] 2022/09/22 07:13:13.358223 [INF] | || | _| | | \__ \ | | | / _| / _ \| |\/| |
[547417] 2022/09/22 07:13:13.358226 [INF] \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_| |_|
[547417] 2022/09/22 07:13:13.358229 [INF]
[547417] 2022/09/22 07:13:13.358232 [INF] https://docs.nats.io/jetstream
[547417] 2022/09/22 07:13:13.358235 [INF]
[547417] 2022/09/22 07:13:13.358238 [INF] ---------------- JETSTREAM ----------------
[547417] 2022/09/22 07:13:13.358245 [INF] Max Memory: 2.82 GB
[547417] 2022/09/22 07:13:13.358249 [INF] Max Storage: 40.00 GB
[547417] 2022/09/22 07:13:13.358253 [INF] Store Directory: "/var/lib/nats/jetstream"
[547417] 2022/09/22 07:13:13.358256 [INF] -------------------------------------------
[547417] 2022/09/22 07:13:13.363614 [INF] Restored 0 messages for stream '$G > $MQTT_msgs'
[547417] 2022/09/22 07:13:18.009932 [INF] Restored 396,633 messages for stream '$G > $MQTT_rmsgs'
[547417] 2022/09/22 07:13:20.160332 [INF] Restored 41,278 messages for stream '$G > $MQTT_sess'
[547417] 2022/09/22 07:13:20.162085 [INF] Recovering 3 consumers for stream - '$G > $MQTT_rmsgs'
[547417] 2022/09/22 07:13:20.167607 [INF] Starting JetStream cluster
[547417] 2022/09/22 07:13:20.167688 [INF] Creating JetStream metadata controller
[547417] 2022/09/22 07:13:20.170821 [INF] JetStream cluster recovering state
[547417] 2022/09/22 07:13:20.173535 [INF] Listening for MQTT clients on tls://0.0.0.0:8883
[547417] 2022/09/22 07:13:20.173727 [INF] Listening for client connections on 0.0.0.0:4222
[547417] 2022/09/22 07:13:20.173742 [INF] TLS required for client connections
[547417] 2022/09/22 07:13:20.174196 [INF] Server is ready
but after some time RES physical memory hits more than 2.82G
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
559236 daemon 20 0 6412824 3.9g 10044 S 11.8 53.1 7:10.72 nats-server
559236 daemon 20 0 6412824 3.9g 10044 S 9.3 53.1 7:11.00 nats-server
559236 daemon 20 0 6412824 3.9g 10044 S 7.9 53.1 7:11.24 nats-server
559236 daemon 20 0 6412824 3.9g 10044 S 8.6 53.1 7:11.50 nats-server
559236 daemon 20 0 6412824 3.9g 10044 S 9.3 53.1 7:11.78 nats-server
time to time reaches and 5.0G
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 35 (19 by maintainers)
FYI we are making good internal progress on making streams and KVs with lots of interior deletes more memory efficient. We are seeing a 100x improvement in early tests.
These improvements will most likely hit nightly builds at some point next week and be part of the 2.10 release.
Thanks for your patience.
Again, from the data you provided, the memory usage was due to:
It’s not that you are doing things wrong, it’s just that the current implementation of the MQTT layer on top of JetStream is having some limitations when it comes to how the server handles deleted messages within a stream, what we call interior deletes (meaning when it is not as simple as moving forward the first sequence in a given stream’s message block). We are trying to find better ways to handle that.
@eliteaz This is what I was thinking, between the session stream and retained messages stream, there are a LOT of interior deletes, which we are not really optimized for at the moment. There is no quick fix, but the team needs to think of ways to make this scenario work better. We will keep you posted. Thanks!
Yes, sure. I will try to help
nats stream info
output@kozlovic I have wiped our cluster’s metada from
store_dir: /var/lib/nats
and started a cluster again before that we have disabled such a session parameter in our andoid boxes https://www.eclipse.org/paho/files/javadoc/org/eclipse/paho/client/mqttv3/MqttConnectOptions.html#setCleanSession-boolean- now we have much more less RAM usageBut interesting is that before wipe’ing
store_dir
we have had a huge RAM consumption per node (5GB and above) for rather long period (about week) now I see there is much more less sessions files at/var/lib/nats/jetstream/$G/streams/$MQTT_sess
So maybe sessions establishment helped us, but really don’t know why after cleanstore_dir
it only started to work properly ?