rancher: Agent in trouble, but does not exit - tar: Error is not recoverable: exiting now

I am starting an agent with (AFAIK) the correct server key. The agent stays up, but does not appear in the server. Logs give below message, 165 times. Note that I did install rancher-agent on this host before, and removed/purged it from the server. (maybe the server is not clearly resetting it’s state?)

It would be nice to have better diagnostics on what went wrong etc. In general, it would be good to have some simple way of checking (like exit status, of maybe some rest call to the agent) whether the agent permanently failed with connecting to the server (e.g. if you want to automate the installation of rancher agent on a vm)

INFO: Starting agent for 17D99798E3E9D0EEA8ED
INFO: Access Key: 17D99798E3E9D0EEA8ED
INFO: Config URL: http://demo.domain.com/v1
INFO: Storage URL: http://demo.domain.com/v1
INFO: API URL: http://demo.domain.com/v1
INFO: IP: 130.211.91.22
INFO: Port:
INFO: Required Image: rancher/agent:v0.7.9
INFO: Current Image: rancher/agent:v0.7.9
INFO: Using image rancher/agent:v0.7.9
INFO: Downloading agent http://demo.domain.com/v1/configcontent/configscripts{"id":"2f146fc7-4a49-46fb-a5c9-d500f3fb75fe","type":"error","links":{},"actions":{},"status":401,"code":"Unauthorized","message":"Unauthorized","detail":null}gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
INFO: Starting agent for 17D99798E3E9D0EEA8ED
INFO: Access Key: 17D99798E3E9D0EEA8ED
INFO: Config URL: http://demo.domain.com/v1
INFO: Storage URL: http://demo.domain.com/v1
INFO: API URL: http://demo.domain.com/v1
INFO: IP: 130.211.91.22
INFO: Port:
INFO: Required Image: rancher/agent:v0.7.9
INFO: Current Image: rancher/agent:v0.7.9
INFO: Using image rancher/agent:v0.7.9
INFO: Downloading agent http://demo.domain.com/v1/configcontent/configscripts{"id":"3bfd75de-3aa3-42d1-bc32-18ec29d88f28","type":"error","links":{},"actions":{},"status":401,"code":"Unauthorized","message":"Unauthorized","detail":null}gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now


About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Reactions: 4
  • Comments: 24 (7 by maintainers)

Most upvoted comments

+1 I get exactly the same problem, ant thanks to @cloudnautique when I delete the /var/lib/rancher/state directory on my host, I was able again to register.

@npcode You should try removing /var/lib/rancher/state before adding an agent if you hit this error. Are you re-using an agent?

On the agent hosts. When an agent starts it gathers info, creates a rancher-agent-state container then launches rancher-agent. That state container creates a bind mount to /var/lib/rancher/state on the host so that it can persist data.

The agent is trying to get the configuration information that it needs, like python-agent, host-api, and other package versions/config. It looks like the token its using is not able to auth to the server.

Any chance this is a new rancher environment or Auth settings changed? Is this host being reused/provisioned? If so, have you tried stopping the agents, removing rancher-agent-state container then try reregistering?