rancher: Improve error logs in health check containers when health check is paused when ipsec is not in "active" state.
Rancher server - Build from v1.6-development
Steps to reproduce the problem: Upgrade ipsec service.
When ipsec service is getting upgraded , health check is paused resulting in following error message:
12/14/2017 10:01:28 AMtime="2017-12-14T18:01:28Z" level=error msg="Failed to report status 25e575c6-3bd8-480a-aa17-c3dc6ff7cf04_e79ef58d-370e-4f1e-9b51-7538c96a391d_1=DOWN 1/2: Bad response from [http://104.198.156.63:8080/v2-beta/serviceevents], statusCode [409]. Status [409 Conflict]. Body: [{\"id\":\"320fc91a-0152-4592-abe8-2c5be68c8220\",\"type\":\"error\",\"links\":{},\"actions\":{},\"status\":409,\"code\":\"Conflict\",\"message\":\"Conflict\",\"detail\":null,\"baseType\":\"error\"}]"
12/14/2017 10:01:28 AMtime="2017-12-14T18:01:28Z" level=error msg="Failed to report status 25e575c6-3bd8-480a-aa17-c3dc6ff7cf04_98dac6e4-12c6-4c41-bbe5-a51f346e7910_1=DOWN: Bad response from [http://104.198.156.63:8080/v2-beta/serviceevents], statusCode [409]. Status [409 Conflict]. Body: [{\"id\":\"d0a91a13-3648-4a53-a622-1b432cd4dcfc\",\"type\":\"error\",\"links\":{},\"actions\":{},\"status\":409,\"code\":\"Conflict\",\"message\":\"Conflict\",\"detail\":null,\"baseType\":\"error\"}]"
12/14/2017 10:01:28 AMtime="2017-12-14T18:01:28Z" level=info msg="25e575c6-3bd8-480a-aa17-c3dc6ff7cf04_5441afe7-4bb8-4820-9814-cda2667cbc15_1=INIT"
12/14/2017 10:01:28 AMtime="2017-12-14T18:01:28Z" level=info msg="25e575c6-3bd8-480a-aa17-c3dc6ff7cf04_0dc1f85a-4fda-4080-b851-6dfc259602ec_1=INIT"
12/14/2017 10:01:30 AMtime="2017-12-14T18:01:30Z" level=error msg="Failed to report status 25e575c6-3bd8-480a-aa17-c3dc6ff7cf04_98dac6e4-12c6-4c41-bbe5-a51f346e7910_1=DOWN 1/2: Bad response from [http://104.198.156.63:8080/v2-beta/serviceevents], statusCode [409]. Status [409 Conflict]. Body: [{\"id\":\"c26d3887-ea3f-408b-8254-68033004cfa0\",\"type\":\"error\",\"links\":{},\"actions\":{},\"status\":409,\"code\":\"Conflict\",\"message\":\"Conflict\",\"detail\":null,\"baseType\":\"error\"}]"
These error messages can be improved to be reflect the state rather than a 409 conflict error
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 4
- Comments: 20
I’ve faced similar issues.
Solution:
Add Host
, filling Step 4 with the docker server’s IPIn my case, our environment had been destroied in some reason, then we are going to recreate all the compoent. I’ve tried to start the agent without specified agent located server’s IP, when the second hosts set up, the issue comes up. Then I check the logs of the agent, found that
DETECTED_CATTLE_AGENT_IP
is wrong, and the IP of the host on the Rancher UI shows the same one for each hosts, and it might lead to the rancher trying to edit the same record in the DB, that’ll caused the dead lock.P.S. If you’re facing the same issue, you could specified the IP for fixing in time, then contact to your network administrator to troubleshoot the network settings.
Not quite sure if another repro matters, but I can easily reproduce the issue using one physical (ubuntu 17.10) machine running v1.6.15 on the host and two docker-machine vm nodes. The following code creates kvm vms, but virtualbox works equally.
Start from an empty environment and create the first custom host, “docker-machine ssh kvm-1712-1” and paste the command line from rancher.
The services go up and everything is green.
Now add the second node.
ipsec/healthcheck go Up/Down/Initializing/Unhealthy
ipsec-router logs show entries such as:
healthcheck logs go:
Updated system packages (
apt
, since I’m on Ubuntu) and updated to:rancher-agent
v1.2.8rancher-server
v1.6.14still have the problem.
I can afford to reboot the machine and restart docker, but unfortunately I can’t afford more than 3 nodes.
I’m using DHCP and ran into this issue as well. (Though my DHCP address has not changed)
Restarting the network stack in ROS resolved the issue I was having with the health check service.
fwiw I have the same setup as @mjaverto : server is a DNS entry that resolves to private IP; hosts are all private IPs in same subnet. have rebuilt hosts from bare VM several times and re-registered to no avail; server also. >1 host will reproduce the issue, but exactly one host will run fine (doesn’t matter which).