lifecycled: [Major Bug] Memory leak causing increased CPU/Memory usage over time on idle host

Environment: I have two completely idle ec2 amazon linux 2 instances with lifecycled v3.0.1 and docker installed but no containers running. As a control, One instance is without lifecycled installed but still has docker installed with no containers running.

Behavior: See attached metrics

htop snapshot from affected instance which shows heavy cpu/memory usage by lifecycled

Command line used in systemd unit:

/opt/lifecycled/lifecycled --cloudwatch-group=${LIFECYCLED_CLOUDWATCH_GROUP} --handler=${LIFECYCLED_HANDLER} --sns-topic=${LIFECYCLED_SNS_TOPIC} --json

The instance with lifecycled running gradually consumes memory and CPU over time. The control instance with just docker installed remains idle/flat over time.

Expected behavior: Instance cpu and memory should remain mostly idle/flat over a length of time.

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 16 (11 by maintainers)

Most upvoted comments

The new trace stuff is kind of amazing too: https://medium.com/@cep21/using-go-1-10-new-trace-features-to-debug-an-integration-test-1dc39e4e812d

lox on Mar 2, 2019

I think I found one more thing in the cloudwatch code. Moving the NewTimer alone did not seem to fix the issue but it does look like it slowed the increase down. I also added a patch to the cloudwatch code which seems to resolve the leak but I’m not sure if it has other side-effects. It would be good to have that confirmed with someone more familiar. I’ll send a PR for cloudwatch in a few minutes for review.

Here’s a screenshot of the lifecycled with the patched SpotListener

Here’s a screenshot of the lifecycled with both patched SpotListener and cloudwatch code

mlehner616 on Mar 1, 2019