solana: Suspected memory leak in TpuClient and/or Connection Cache

Problem

While using solana tpu manager in lite-rpc we noticed that memory increased and increased forever. Initially, lite-rpc consumed more or less 500 MBs of memory but after running it for around a day it consumed more than 3Gbs of memory. Moreover, after a long run transactions sent over TPU client were never confirmed, i.e they never reached the leader. We solved this issue by destroying and recreating the TPU Client and Connection Cache every 5 minutes. After recreating it every 5 minutes the memory usage was stable. We suspect some underlying issue with TPUClient and/or ConnectionCache.

To test the issue I have designed a simple program that sends 100 transactions using TPU Client every second. https://github.com/godmodegalactus/solana-tpu-client-test

The program also logs memory usage using the crate procinfo: https://docs.rs/procinfo/latest/procinfo/pid/struct.Statm.html

Testing parameters : solana-sdk = 1.14.18 cluster = testnet RPC = private testnet rpc without any rate limits

Initially consuming 60MBs of memory the process consumes more than 400 Mbs of memory and crashes within an hour with error.

thread 'main' panicked at 'QuicLazyInitializedEndpoint::create_endpoint bind_in_range: Os { code: 24, kind: Uncategorized, message: "Too many open files" }', /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/solana-client-1.14.18/src/nonblocking/quic_client.rs:98:10

memory logs : First log:

Stats after 60 s 
 Statm {
    size: 311150,
    resident: 6262,
    share: 2330,
    text: 1326,
    data: 13899,
}

Last log

Stats after 2940 s 
 Statm {
    size: 408815,
    resident: 110947,
    share: 2362,
    text: 1326,
    data: 122078,
}

This issue cannot be reproduced on local cluster, it requires a cluster with log of nodes. Please follow readme file of the repo to test.

Proposed Solution

Solve the memory issue with solana tpu client of connection cache.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (18 by maintainers)

Most upvoted comments

Apologies, was on vacation/digital detox, and this fell under my radar. Taking a look now.