grpc-node: Potential memory leak in resolver-dns
Problem Description Previously, we had an issue where upgrading from @grpc/grpc-js from 1.3.x to 1.5.x introduced a channelz memory leak (fixed in this issue for 1.5.10)
Upgrading to 1.5.10 locally seems to be fine and I have noticed no issues. However, when we upgraded our staging/production environments, a memory leak seems to come back with the only difference being updating from @grpc/grpc-js 1.3.x to 1.5.10.
Using Datadog’s continuous profiler, I wasn’t sure if this was the root issue, but there is definitely a growing heap.
Again, we are running a production service with a single grpc-js server that creates multiple grpc-js clients. The clients are created and destroyed using lightning-pool.
Channelz is disabled when we initialize the server/clients with 'grpc.enable_channelz': 0
(for server and clients)
Reproduction Steps The reproduction steps is still the same as before, except I guess this time the service is under staging/production load?
Create a single grpc-js server that calls grpc-js clients as needed from a pool resource with channelz disabled. In our case, the server is running and when requests are made, we acquire a client via the pool (factory created once as a singleton) to make a request. These should be able to handle concurrent/multiple requests.
Environment
- OS Name: macOS (locally testing) and running on AWS EKS clusters (production)
- Node Version: 14.16.0
- Package Name and Version: @grpc/grpc-js@1.5.10
Additional Context
Checking out the profiler with Heap Live Size
, it looks like there is a growing heap size for backoff-timeout.js
, resolver-dns.js
, load-balancer-child-handler.js
, load-balancer-round-robin.js
and channel.ts
. I let it run for about 2.5 hours and I am comparing the heap profiles from the first 30mins and the last 30 minutes to see what has changed.
When comparing with @grpc/grpc-js@1.3.x, these look like they aren’t used.
I see that 1.6.x made some updates to some timers, was wondering if it could be related?
Happy to provide more context or help as needed.
NOTE: Clarifying the graph, the start/end time of the problem starts within the highlighted intervals. Everything else is from a different process and rolling the package back.
(Detail view of the other red section from above)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (12 by maintainers)
The requested tests have been added in #2105.
@sam-la-compass Can you check if the latest version of grpc-js fixes the original bug for you?