dogstatsd-ruby: Memory leak in 5.0.1

We are using a Datadog::Statsd object in a sidekiq worker. When the worker executes, we basically do this:

statsd = Datadog::Statsd.new("localhost", 8125)
statsd.increment(....) # specific params not included here

When upgrading from 4.8.3 to 5.0.1, we are seeing memory usage on the instance start to climb linearly until it finally exhausts all memory and we get a ThreadError: can't create Thread: Resource temporarily unavailable. We have definitively pinpointed this problem to the 5.0.1 upgrade — there were no other changes made other than upgrading just the dogstatsd-ruby gem.

You can see the mem usage problem in the graph below: Screen Shot 2021-04-21 at 5 36 50 PM (each little dip is a deploy where we changed just one gem version. The last one is where we upgraded dogstatsd-ruby from 4.8.3 to 5.0.1).

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 24 (13 by maintainers)

Most upvoted comments

Hello everyone, People having this issue should consider using the single_thread mode recently added into v5.2.0: release note. Like v4.x versions, this single thread mode is not creating any companion thread to do the flush, which will avoid having issues for processes using fork.

@remeh I will try this and let you know soon. Thanks!

UPDATE: Actually going to wait until 5.0.2. Sorry! Priorities… 😦

@remeh Hello - so yes the error was the same ThreadError: can't create Thread: Resource temporarily unavailable.

We ended up changing the code to use the single_thread: true mode as well as call #close (and additionally to not keep an instance of Statsd around longer than needed), and that seems to have fixed the problem.

This issue should be addressed by the latest release of ddtrace (0.51.0), as StatsD threads are not initialized anymore by ddtrace.

If anyone is seeing this issue in their environment, please upgrade to ddtrace >= 0.51.0 and dogstastd-ruby >= 5.2.0.

ddtrace 0.51.0 also has new safeguards that will prevent the internal initialization of affected versions of dogstastd-ruby (5.0.0 <= version < 5.0.0). Internal Statsd usage will be disabled with these affected versions, to prevent resource leaks. Tracing will continue as usual.

Hello @mobilutz Yes, it is still present (mentioned in the CHANGELOG), we released 5.1.0 because the flush on close will solve missing metrics for some users. For current issue, I’m working on adding a single-thread mode for when users can’t create and destroy the instance during the lifecycle of a forked process, which often happens while using job libraries or other libraries heavily relying on forks, and which is most likely part of the thread leak issue. I’ll notify in this issue once it’s available.

sorry @marcotc - since this was spotted on a critical piece of infrastructure, we cannot deploy test releases in prod. I may have some time next week to setup a stripped down test case and try it there.

@deepfryed, I’ve tested my application with Puma, but still no leak.

this is what we got when checking our production container

Correct me if I’m wrong, but you still have an environment running the problematic gem versions currently. If so, and if this is feasible for you, would you be able to add some logging around the creation of the “sender” thread:

module DatadogStatsdSenderDebug
  def start
    if defined?(Rails.logger)
      Rails.logger.warn("Statsd thread created: #{caller}")
    else
      puts "Statsd thread created: #{caller}"
    end

    super
  end
end

require 'datadog/statsd'
Datadog::Statsd::Sender.prepend(DatadogStatsdSenderDebug)

I’d recommend placing this before your Datadog.configure block. Feel free to modify the logging output mechanism. This logging will be verbose, so feel free to conditionally enable this in the relevant environment for your team.

@remeh I use gem ddtrace that uses the DogStatsD-ruby internally.

Only config that has is:

# config/initializers/datadog.rb
if Rails.env.production?
  Datadog.configure do |c|
    c.use :sidekiq, service_name: "name_sidekiq", client_service_name: "name_sidekiq", tag_args: true
  end
end

Probably the ddtrace not call method close correctly.