solana: solana-validator leaks memory (but at very slow pace)
Problem
solana-validator (tested v1.4.19) definitely leaks memory needing periodic restart of once per a week or so.
The pace seems stable across nodes at the rate of 1-2G/day.
Proposed Solution
Debug.
we don’t know this existed on the v1.3 line as well. But this leak is observed from both RPC and non-RPC nodes. All, the leak happening on RssAnon. This excludes AppendVec (mmap) as it’s accounted under RssFile
So, remaining culprits: gossip, blockstore, runtime, rocksdb, etc.
For runtime, blockstore, I think we can just run loong ledger-tool verify session.
CC: @carllin
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 46 (46 by maintainers)
Commits related to this issue
- adds an upper bound on cluster-slots size https://github.com/solana-labs/solana/issues/14366#issuecomment-769096305 — committed to behzadnouri/solana by behzadnouri 3 years ago
- adds an upper bound on cluster-slots size https://github.com/solana-labs/solana/issues/14366#issuecomment-769096305 — committed to behzadnouri/solana by behzadnouri 3 years ago
- adds an upper bound on cluster-slots size (#15300) https://github.com/solana-labs/solana/issues/14366#issuecomment-769096305 — committed to solana-labs/solana by behzadnouri 3 years ago
- adds an upper bound on cluster-slots size (#15300) https://github.com/solana-labs/solana/issues/14366#issuecomment-769096305 (cherry picked from commit f79c9d40944580afa565b8539ead7fd9928f3f20) — committed to solana-labs/solana by behzadnouri 3 years ago
- adds an upper bound on cluster-slots size (#15300) (#15357) https://github.com/solana-labs/solana/issues/14366#issuecomment-769096305 (cherry picked from commit f79c9d40944580afa565b8539ead7fd9928f3f... — committed to solana-labs/solana by mergify[bot] 3 years ago
- excludes epoch-slots from nodes with unknown or different shred version Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots ar... — committed to behzadnouri/solana by behzadnouri 3 years ago
- excludes epoch-slots from nodes with unknown or different shred version Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots ar... — committed to behzadnouri/solana by behzadnouri 3 years ago
- excludes epoch-slots from nodes with unknown or different shred version Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots ar... — committed to behzadnouri/solana by behzadnouri 3 years ago
- excludes epoch-slots from nodes with unknown or different shred version (#17899) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoc... — committed to solana-labs/solana by behzadnouri 3 years ago
- excludes epoch-slots from nodes with unknown or different shred version (#17899) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch... — committed to solana-labs/solana by behzadnouri 3 years ago
- excludes epoch-slots from nodes with unknown or different shred version (#17899) (#17916) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Th... — committed to solana-labs/solana by mergify[bot] 3 years ago
- excludes epoch-slots from nodes with unknown or different shred version (#17899) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch... — committed to solana-labs/solana by behzadnouri 3 years ago
- excludes epoch-slots from nodes with unknown or different shred version (backport #17899) (#19551) * excludes epoch-slots from nodes with unknown or different shred version (#17899) Inspecting TDS g... — committed to solana-labs/solana by mergify[bot] 3 years ago
- adds unprefixed_malloc_on_supported_platforms to jemalloc (#20317) Without this feature jemalloc is used only for Rust code but not for bundled C/C++ libraries (like rocksdb). https://github.com/so... — committed to solana-labs/solana by behzadnouri 3 years ago
- adds unprefixed_malloc_on_supported_platforms to jemalloc (#20317) Without this feature jemalloc is used only for Rust code but not for bundled C/C++ libraries (like rocksdb). https://github.com/sola... — committed to solana-labs/solana by behzadnouri 3 years ago
- adds unprefixed_malloc_on_supported_platforms to jemalloc (#20317) (#20325) Without this feature jemalloc is used only for Rust code but not for bundled C/C++ libraries (like rocksdb). https://github... — committed to solana-labs/solana by mergify[bot] 3 years ago
- adds unprefixed_malloc_on_supported_platforms to jemalloc (#20317) Without this feature jemalloc is used only for Rust code but not for bundled C/C++ libraries (like rocksdb). https://github.com/so... — committed to identity-com/solana by behzadnouri 3 years ago
We do seemingly have another memory leak/growth (master & v1.14) that is in the early stages of investigation at the moment. That being said, I’m in favor of closing this issue due to its’ age. Releases are different and code is so different that I think new investigation would be worthy of a new issue (currently in Discord).
We could always reference this issue as a “prior work” in a new issue.
I’ve had a node running
v1.9.9against mainnet-beta for a couple weeks; it showed a 4-5 GB / day ramp shortly after starting; however, memory has looked pretty stable the last 2+ weeks:hi, i haven’t actively investigating the possible memory leak bug, which i thought i came up with. seems it’s false alarm… also, i was testing v1.8.x line.
Hi @behzadnouri, sorry for bothering, but it looks like
unprefixed_malloc_on_supported_platformsfeature should be enabled intikv-jemalloc-sys. Without this, jemalloc is used only for Rust code but not for bundled C/C++ libraries (like rocksdb). This seems wrong.memory leak reports on v1.7: https://discord.com/channels/428295358100013066/439194979856809985/877222446061727764 https://discord.com/channels/428295358100013066/689412830075551748/877190383275212900
https://gist.githubusercontent.com/behzadnouri/6acaae1c9664f0a3445827d25f28305a/raw/9687afbf2a49591de4201984bd338f807eb2aff5/heaptrack-validator-2021-08-18-v1.7-34107 ☝️ heaptrack on a testnet validator running v1.7 with some recent cluster-slots patches (which I am backporting to v1.7 now).
@ryoqun sure, I will look into
cluster_slots.rs. I think the code is new but the logic is the same as before. definitely needs some more digging. thanks@ryoqun, hmmm weird, is the node caught up with the cluster? I can only imagine those far-future slots if:
I think we can distinguish between the above by seeing how many nodes are in the
ClusterSlotsthat have completed a slot > root + 10,000. If it’s a few, it might be some pollution, if it’s a lot AND we’re sure we’re near the tip, then probably something is wrong with the compression/decompression path.For context, when thinking about whether we can do a blanket filter like
*slot > root + 10000the primary two places whereClusterSlotsis used:Propagation status for your own leader slots: https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L1545-L1549. Here it’s fine to ignore far future slots since you only care about your own leader slots and slots built on top of your leader slot, which should be in a reasonable range from your current root
For weighting repairs, to find validators who have actually completed that slot: https://github.com/solana-labs/solana/blob/master/core/src/cluster_slots.rs#L110. This currently magnifies the weight of nodes that have completed the slot by a factor of 2. I imagine this might be useful in catchup scenarios where validators are trying to repair slots that are far in the future, for instance if a node is > 10,000 slots behind. To get around this, we may be able to leverage information based on votes in the cluster about which slots in the future are actaully relevant. This is already done here: https://github.com/solana-labs/solana/blob/master/core/src/repair_weight.rs#L142-L148 to find the best orphans to repair. We could do something like, ignore
*slot > root + 10000 && slot > best_orphan.I doubt that that is the case. You mention:
but the issue with https://github.com/solana-labs/solana/pull/14467 does not go away with restart. If you restart a node it quickly syncs up to the previous table it had in memory. Also as you also mentioned, those stack traces do not show relevant
crdsorClusterInfoeither. There is one withClusterInfo_handle_pull_requestswhich is not good, but that seems unrelated to the crds table size thing.yes, it is
cluster_info_stats.table_size.I think that might be right if we are not ok with the excess capacity. when I looked both Vec and HashMap I don’t think size-down capacity for
.resize,.retain.removeetc.Not that slow on tds with 1.4.20:
Anon pages as well:
(accounts not on a tmpfs)