rancher: high CPU usage across all hosts during create/upgrade due to metadata service + DNS
Rancher Versions: Server: 1.5.5 healthcheck: 0.2.3 ipsec: 0.0.7 network-services: 0.9.1 (metadata) and 0.6.6 (network manager) scheduler: 0.7.5 kubernetes (if applicable): nopes
Docker Version: 17.03.0-ce
OS and where are the hosts located? (cloud, bare metal, etc): Ubuntu 16.04.2 LTS (4.4.0) and a few Ubuntu 14.04.5 LTS (3.13.0) running on AWS
Setup Details: (single node rancher vs. HA rancher, internal DB vs. external DB) HA Rancher with RDS
Environment Type: (Cattle/Kubernetes/Swarm/Mesos) Cattle
Steps to Reproduce:
- Start a cluster with a good number of machines to make sure your metadata service has plenty of data. (the environment where I noted this today was running 48 hosts)
- Launch a bunch of containers (maybe some Telegraf to shove CPU and network data into an InfluxDB so you can see stuff in Grafana later)
- Launch some more containers… maybe upgrade some containers…
Results:
High CPU usage is observed on ALL hosts in the cluster. Here is one host during an upgrade:
DNS service sends a lot of data during this time as well:
Digging into logs for the metadata service I see a ton of quick reloading of answers
time="2017-04-13T22:00:12Z" level=info msg="Download and reload in: 152.962397ms"
time="2017-04-13T22:00:12Z" level=info msg="Update requested for version: 672275"
time="2017-04-13T22:00:12Z" level=info msg="Downloaded in 17.306846ms"
time="2017-04-13T22:00:12Z" level=info msg="Generating and reloading answers"
time="2017-04-13T22:00:12Z" level=info msg="Update requested for version: 672274"
time="2017-04-13T22:00:12Z" level=info msg="Generating answers"
time="2017-04-13T22:00:12Z" level=info msg="Generated and reloaded answers"
time="2017-04-13T22:00:12Z" level=info msg="Applied http://ha.rancher.mux/v1/configcontent/metadata-answers?client=v2&requestedVersion=672274?version=672276-14a7928f14789d3a00ab33efdfd9c22c"
time="2017-04-13T22:00:12Z" level=info msg="Download and reload in: 312.971699ms"
time="2017-04-13T22:00:12Z" level=info msg="Downloaded in 10.124486ms"
time="2017-04-13T22:00:12Z" level=info msg="Generating and reloading answers"
time="2017-04-13T22:00:12Z" level=info msg="Update requested for version: 672278"
time="2017-04-13T22:00:12Z" level=info msg="Generating answers"
time="2017-04-13T22:00:13Z" level=info msg="Generated and reloaded answers"
time="2017-04-13T22:00:13Z" level=info msg="Applied http://ha.rancher.mux/v1/configcontent/metadata-answers?client=v2&requestedVersion=672274?version=672275-14a7928f14789d3a00ab33efdfd9c22c"
time="2017-04-13T22:00:13Z" level=info msg="Download and reload in: 330.665421ms"
time="2017-04-13T22:00:13Z" level=info msg="Update requested for version: 672278"
time="2017-04-13T22:00:13Z" level=info msg="Downloaded in 42.239164ms"
time="2017-04-13T22:00:13Z" level=info msg="Generating and reloading answers"
time="2017-04-13T22:00:13Z" level=info msg="Update requested for version: 672277"
time="2017-04-13T22:00:13Z" level=info msg="Generating answers"
time="2017-04-13T22:00:13Z" level=info msg="Generated and reloaded answers"
time="2017-04-13T22:00:13Z" level=info msg="Applied http://ha.rancher.mux/v1/configcontent/metadata-answers?client=v2&requestedVersion=672278?version=672278-14a7928f14789d3a00ab33efdfd9c22c"
time="2017-04-13T22:00:13Z" level=info msg="Download and reload in: 298.367767ms"
time="2017-04-13T22:00:13Z" level=info msg="Downloaded in 19.420332ms"
time="2017-04-13T22:00:13Z" level=info msg="Generating and reloading answers"
time="2017-04-13T22:00:13Z" level=info msg="Generating answers"
time="2017-04-13T22:00:13Z" level=info msg="Generated and reloaded answers"
time="2017-04-13T22:00:13Z" level=info msg="Applied http://ha.rancher.mux/v1/configcontent/metadata-answers?client=v2&requestedVersion=672277?version=672278-14a7928f14789d3a00ab33efdfd9c22c"
time="2017-04-13T22:00:13Z" level=info msg="Download and reload in: 327.065607ms"
time="2017-04-13T22:00:14Z" level=info msg="Update requested for version: 672279"
time="2017-04-13T22:00:14Z" level=info msg="Downloaded in 50.613182ms"
time="2017-04-13T22:00:14Z" level=info msg="Generating and reloading answers"
time="2017-04-13T22:00:14Z" level=info msg="Generating answers"
time="2017-04-13T22:00:14Z" level=info msg="Generated and reloaded answers"
at the same time the DNS service is reloading aggressively as well
time="2017-04-13T22:00:12Z" level=info msg="Reloading answers"
time="2017-04-13T22:00:12Z" level=info msg="Reloaded answers"
time="2017-04-13T22:00:12Z" level=info msg="Reloading answers"
time="2017-04-13T22:00:12Z" level=info msg="Reloaded answers"
time="2017-04-13T22:00:12Z" level=info msg="Reloading answers"
time="2017-04-13T22:00:12Z" level=info msg="Reloaded answers"
time="2017-04-13T22:00:13Z" level=info msg="Reloading answers"
time="2017-04-13T22:00:13Z" level=info msg="Reloaded answers"
time="2017-04-13T22:00:13Z" level=info msg="Reloading answers"
time="2017-04-13T22:00:13Z" level=info msg="Reloaded answers"
time="2017-04-13T22:00:14Z" level=info msg="Reloading answers"
time="2017-04-13T22:00:14Z" level=info msg="Reloaded answers"
time="2017-04-13T22:00:14Z" level=info msg="Reloading answers"
time="2017-04-13T22:00:14Z" level=info msg="Reloaded answers"
time="2017-04-13T22:00:15Z" level=info msg="Reloading answers"
time="2017-04-13T22:00:15Z" level=info msg="Reloaded answers"
time="2017-04-13T22:00:15Z" level=info msg="Reloading answers"
time="2017-04-13T22:00:15Z" level=info msg="Reloaded answers"
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 7
- Comments: 18
I’ve got the same issue. CPU usage is really high for this little microservice.
Hello I am experiencing the same after upgrading to 1.6.12.
Also seeing the same thing with rancher 1.6.14 and docker 17.12.
@aemneina I think that this issue ought to be re-opened or tracked in a new issue
We are having similar issues on Rancher 1.6.2 (3k containers, 9 hosts)