rancher: Some of the LB instances stuck in "Initializing" state sometimes.

Server version - v0.51.0 Set up had 8 hosts

Started a LB service that is global. Out of the 8 lb instances , 2 of the instances are stuck in “Initializing” state.

Following errors seen in the container logs for the agents corresponding to the containers that are stuck in “initializing” state.

1/20/2016 10:54:22 AM[config.sh:118] info Downloading http://104.197.236.168:8080/v1//configcontent//services current=
1/20/2016 10:54:22 AM[scripts.sh:19] echo INFO: Downloading http://104.197.236.168:8080/v1//configcontent//services current=
1/20/2016 10:54:22 AMINFO: Downloading http://104.197.236.168:8080/v1//configcontent//services current=
1/20/2016 10:54:22 AM[config.sh:120] get 'http://104.197.236.168:8080/v1//configcontent//services?current='
1/20/2016 10:54:22 AM[scripts.sh:43] '[' 'http://104.197.236.168:8080/v1//configcontent//services?current=' = --no-auth ']'
1/20/2016 10:54:22 AM[scripts.sh:47] call_curl -L 'http://104.197.236.168:8080/v1//configcontent//services?current='
1/20/2016 10:54:22 AM[scripts.sh:29] local 'curl=curl -s --connect-timeout 20'
1/20/2016 10:54:22 AM[scripts.sh:30] '[' '' = false ']'
1/20/2016 10:54:22 AM[scripts.sh:32] '[' -n 'Basic QjdERDZDNTFEM0MxRDc0NEY4MzM6akdxUWdoYlc5aUVYUWlQcTJIVWVXZUtaYk50cFZKbm9CVUc2Vm1nRg==' ']'
1/20/2016 10:54:22 AM[scripts.sh:33] curl -s --connect-timeout 20 -H 'Authorization: Basic QjdERDZDNTFEM0MxRDc0NEY4MzM6akdxUWdoYlc5aUVYUWlQcTJIVWVXZUtaYk50cFZKbm9CVUc2Vm1nRg==' -L 'http://104.197.236.168:8080/v1//configcontent//services?current='
1/20/2016 10:54:42 AM[scripts.sh:1] cleanup
1/20/2016 10:54:42 AM[config.sh:9] EXIT=28
1/20/2016 10:54:42 AM[config.sh:11] '[' -e /var/lib/cattle/download.p82mGiY ']'
1/20/2016 10:54:42 AM[config.sh:12] rm -rf /var/lib/cattle/download.p82mGiY
1/20/2016 10:54:42 AM[config.sh:15] return 28
1/20/2016 10:54:42 AM
The system is going down NOW!
1/20/2016 10:54:42 AM
Sent SIGTERM to all processes
1/20/2016 10:54:43 AM
Sent SIGKILL to all processes
1/20/2016 10:54:43 AM
Requesting system reboot

rancher-server logs:

2016-01-20 19:15:08,409 INFO  [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [services], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:08,409 INFO  [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [agent-instance-scripts], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:08,410 INFO  [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [monit], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:08,410 INFO  [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [agent-instance-startup], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:08,411 INFO  [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [haproxy], not in sync requested [3] != applied [-1] 
2016-01-20 19:15:08,411 INFO  [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Requesting update of item(s) [agent-instance-startup, haproxy] on [agent:21] 
2016-01-20 19:15:08,412 INFO  [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:21] on [agent-instance-startup], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:08,412 INFO  [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:21] on [haproxy], not in sync requested [3] != applied [-1] 
2016-01-20 19:15:09,441 INFO  [:] [] [] [] [cutorService-23] [i.c.p.a.s.s.impl.AgentServiceImpl   ] Timeout waiting for response to [delegate.request] id [04408794-c2a6-45e9-8ba8-063cef2d6324] 
2016-01-20 19:15:09,441 INFO  [:] [] [] [] [cutorService-23] [i.c.p.a.s.s.impl.AgentServiceImpl   ] Timeout waiting for response to [config.update] id [90749760-9981-4949-96d4-b11a0d1a4bed] 
2016-01-20 19:15:11,441 INFO  [:] [] [] [] [cutorService-22] [i.c.p.a.s.s.impl.AgentServiceImpl   ] Timeout waiting for response to [delegate.request] id [291b945e-91c9-4ccb-8f6e-6d5e5baa4c3a] 
2016-01-20 19:15:11,441 INFO  [:] [] [] [] [cutorService-22] [i.c.p.a.s.s.impl.AgentServiceImpl   ] Timeout waiting for response to [config.update] id [cadad389-c94c-483c-a5cf-0639f2bb21a6] 
2016-01-20 19:15:11,443 INFO  [:] [] [] [] [ecutorService-1] [.p.c.v.i.ConfigItemStatusManagerImpl] Timeout updating item(s) [agent-instance-startup, haproxy] on [agent:21] 
2016-01-20 19:15:11,443 INFO  [:] [] [] [] [ecutorService-1] [.p.c.v.i.ConfigItemStatusManagerImpl] Timeout updating item(s) [agent-instance-startup, haproxy] on [agent:21] 
2016-01-20 19:15:11,443 INFO  [:] [] [] [] [ecutorService-1] [.p.c.v.i.ConfigItemStatusManagerImpl] Timeout updating item(s) [agent-instance-startup, haproxy] on [agent:21] 
2016-01-20 19:15:11,443 INFO  [:] [] [] [] [ecutorService-1] [.p.c.v.i.ConfigItemStatusManagerImpl] Timeout updating item(s) [agent-instance-startup, haproxy] on [agent:21] 
2016-01-20 19:15:12,607 ERROR [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Failed null, exit code [1] output [nsenter: failed to execute /var/lib/cattle/events/config.update: No such file or directory
] 
2016-01-20 19:15:23,428 INFO  [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Requesting update of item(s) [services, agent-instance-scripts, monit, agent-instance-startup, haproxy] on [agent:23] 
2016-01-20 19:15:23,430 INFO  [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [services], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:23,430 INFO  [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [agent-instance-scripts], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:23,430 INFO  [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [monit], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:23,431 INFO  [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [agent-instance-startup], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:23,431 INFO  [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [haproxy], not in sync requested [3] != applied [-1] 
2016-01-20 19:15:23,432 INFO  [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Requesting update of item(s) [agent-instance-startup, haproxy] on [agent:21] 
2016-01-20 19:15:23,433 INFO  [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:21] on [agent-instance-startup], not in sync requested [1] != applied [-1] 
2016-01-20 19:15:23,433 INFO  [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:21] on [haproxy], not in sync requested [3] != applied [-1] 
2016-01-20 19:15:25,444 INFO  [:] [] [] [] [cutorService-12] [i.c.p.a.s.s.impl.AgentServiceImpl   ] Timeout waiting for response to [config.update] id [90749760-9981-4949-96d4-b11a0d1a4bed] 
2016-01-20 19:15:25,444 INFO  [:] [] [] [] [cutorService-12] [i.c.p.a.s.s.impl.AgentServiceImpl   ] Timeout waiting for response to [delegate.request] id [4f0895d7-5498-46e8-84a3-dfd11a8800fb] 
2016-01-20 19:15:28,615 ERROR [:] [] [] [] [cutorService-10] [.p.c.v.i.ConfigItemStatusManagerImpl] Failed null, exit code [1] output [nsenter: failed to execute /var/lib/cattle/events/config.update: No such file or directory
] 

mysql> select * from service_event where instance_id=34;
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
| id  | name | account_id | kind         | uuid                                 | description | state   | created             | removed | remove_time | data | host_id | healthcheck_uuid                       | instance_id | healthcheck_instance_id | reported_health | external_timestamp |
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
| 124 | NULL |          5 | serviceEvent | 61017225-e65d-46ce-aeff-0b3e3d3fbdbc | NULL        | created | 2016-01-20 18:52:57 | NULL    | NULL        | {}   |       6 | e3355456-cd42-403d-9493-3ba8cb8b228d_1 |          34 |                      12 | DOWN            |         1453315977 |
| 149 | NULL |          5 | serviceEvent | 7b15d035-123e-4e5e-8dbe-27c0fbac38cf | NULL        | created | 2016-01-20 18:54:26 | NULL    | NULL        | {}   |       7 | 0784e862-3f8d-4bef-9529-f0954a981054_1 |          34 |                      12 | DOWN            |         1453316066 |
| 159 | NULL |          5 | serviceEvent | 737ff984-e00c-47df-8d2a-36109fc798ae | NULL        | created | 2016-01-20 18:54:40 | NULL    | NULL        | {}   |       8 | a32d5bcb-67af-4860-86f3-5d9298862fa5_1 |          34 |                      12 | DOWN            |         1453316080 |
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
3 rows in set (0.00 sec)

mysql> select * from service_event where instance_id=36;  
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
| id  | name | account_id | kind         | uuid                                 | description | state   | created             | removed | remove_time | data | host_id | healthcheck_uuid                       | instance_id | healthcheck_instance_id | reported_health | external_timestamp |
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
| 121 | NULL |          5 | serviceEvent | 08da7fab-134a-4faa-bd5d-1b5361c91637 | NULL        | created | 2016-01-20 18:52:55 | NULL    | NULL        | {}   |       4 | 4e16fedc-95a8-4bb5-add8-f3cd67821f6b_1 |          36 |                      15 | DOWN            |         1453315975 |
| 154 | NULL |          5 | serviceEvent | 9c3c5d2c-9f10-441c-9849-2cc1c0bd0209 | NULL        | created | 2016-01-20 18:54:28 | NULL    | NULL        | {}   |       7 | a525a7c7-1ea5-4428-a79f-7b93bbc188c8_1 |          36 |                      15 | DOWN            |         1453316068 |
| 161 | NULL |          5 | serviceEvent | 85f262c9-aae6-46bc-961e-6699d9ad5fae | NULL        | created | 2016-01-20 18:54:40 | NULL    | NULL        | {}   |       8 | 8c83e7c6-5216-4d5a-aafc-26a3d9e0126c_1 |          36 |                      15 | DOWN            |         1453316080 |
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
3 rows in set (0.00 sec)

mysql> 

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 1
  • Comments: 15

Most upvoted comments

@demohi The ipc request has been updated.

As for your LB issues, do you have more than 1 host running? Are you able to ping from inside the network agent from one host to the another host’s network agent’s IP (10.42.x.x)? Exec into one of the network agent and try to ping. If it’s not working, you are having cross host communication issues.

http://docs.rancher.com/rancher/faqs/#cross-host-communication

+1

LB stuck in “Initializing” state, restart not works

Useful Info
Versions Rancher v1.0.0 Cattle: v0.159.2 UI: v0.100.3
Access Disabled
Route environment.code
OS Ubuntu 14.04.3 LTS (3.19.0-56-generic)

I create a stack by a docker-compose config file, I found the ipc options is ignored in rancher.

4/7/2016 11:44:22 AMINFO: Downloading agent http://10.10.228.22:8000/v1/configcontent/configscripts
4/7/2016 11:44:22 AMINFO: Updating configscripts
4/7/2016 11:44:22 AMINFO: Downloading http://10.10.228.22:8000/v1//configcontent//configscripts current=
4/7/2016 11:44:22 AMINFO: Running /var/lib/cattle/download/configscripts/configscripts-1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319/apply.sh
4/7/2016 11:44:22 AMINFO: Sending configscripts applied 1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319
4/7/2016 11:44:22 AMINFO: Updating agent-instance-startup
4/7/2016 11:44:22 AMINFO: Downloading http://10.10.228.22:8000/v1//configcontent//agent-instance-startup current=
4/7/2016 11:44:22 AMINFO: Running /var/lib/cattle/download/agent-instance-startup/agent-instance-startup-1-f070335eb50044157e1f00cae4fcf910fa7ca481d1658d2aa0c317dbe810d7d8/apply.sh
4/7/2016 11:44:22 AMINFO: Updating services
4/7/2016 11:44:22 AMINFO: Downloading http://10.10.228.22:8000/v1//configcontent//services current=
4/7/2016 11:44:22 AMINFO: Running /var/lib/cattle/download/services/services-1-061405f3edd960bfdfe1cfb8447be40eab5b4b608731608e224cc51c5dc30b91/apply.sh
4/7/2016 11:44:22 AMINFO: HOME -> ./
4/7/2016 11:44:22 AMINFO: HOME -> ./services
4/7/2016 11:44:22 AMINFO: Sending services applied 1-061405f3edd960bfdfe1cfb8447be40eab5b4b608731608e224cc51c5dc30b91
4/7/2016 11:44:22 AMINFO: Getting agent-instance-scripts
4/7/2016 11:44:22 AMINFO: Updating agent-instance-scripts
4/7/2016 11:44:22 AMINFO: Downloading http://10.10.228.22:8000/v1//configcontent//agent-instance-scripts current=
4/7/2016 11:44:22 AMINFO: Running /var/lib/cattle/download/agent-instance-scripts/agent-instance-scripts-1-4b5124bd74cd423f98d57550b481ec77ec3a7135c6a650886ab95c043305d642/apply.sh
4/7/2016 11:44:22 AMINFO: HOME -> ./
4/7/2016 11:44:22 AMINFO: HOME -> ./events/
4/7/2016 11:44:22 AMINFO: HOME -> ./events/config.update
4/7/2016 11:44:22 AMINFO: HOME -> ./events/ping
4/7/2016 11:44:22 AMINFO: Sending agent-instance-scripts applied 1-4b5124bd74cd423f98d57550b481ec77ec3a7135c6a650886ab95c043305d642
4/7/2016 11:44:22 AMINFO: Getting monit
4/7/2016 11:44:22 AMINFO: Updating monit
4/7/2016 11:44:22 AMINFO: Downloading http://10.10.228.22:8000/v1//configcontent//monit current=
4/7/2016 11:44:22 AMINFO: Running /var/lib/cattle/download/monit/monit-1-c4113ae48035df162ff89a5d37af1545f002ee54e044535e42395bda7a29a953/apply.sh
4/7/2016 11:44:22 AMINFO: ROOT -> ./
4/7/2016 11:44:22 AMINFO: ROOT -> ./etc/
4/7/2016 11:44:22 AMINFO: ROOT -> ./etc/logrotate.d/
4/7/2016 11:44:22 AMINFO: ROOT -> ./etc/logrotate.d/rancher-logs
4/7/2016 11:44:22 AMINFO: ROOT -> ./etc/monit/
4/7/2016 11:44:22 AMINFO: ROOT -> ./etc/monit/conf.d/
4/7/2016 11:44:22 AMINFO: ROOT -> ./etc/monit/conf.d/logrotate
4/7/2016 11:44:22 AMINFO: ROOT -> ./etc/monit/monitrc
4/7/2016 11:44:22 AMINFO: Sending monit applied 1-c4113ae48035df162ff89a5d37af1545f002ee54e044535e42395bda7a29a953
4/7/2016 11:44:22 AMINFO: Getting haproxy
4/7/2016 11:44:22 AMINFO: Updating haproxy
4/7/2016 11:44:22 AMINFO: Downloading http://10.10.228.22:8000/v1//configcontent//haproxy current=
4/7/2016 11:44:23 AMINFO: Running /var/lib/cattle/download/haproxy/haproxy-2-a5eac3965952846cbd39c610ae44d58f5b54450bd46bdfba35c57dda8edfaab0/apply.sh
4/7/2016 11:44:23 AMINFO: ROOT -> ./
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/default/
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/default/haproxy
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/monit/
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/monit/conf.d/
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/monit/conf.d/haproxy
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/haproxy/
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/haproxy/certs/
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/haproxy/certs/default.pem
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/haproxy/certs/certs.pem
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/haproxy/haproxy.cfg
4/7/2016 11:44:23 AMINFO: Sending haproxy applied 2-a5eac3965952846cbd39c610ae44d58f5b54450bd46bdfba35c57dda8edfaab0
4/7/2016 11:44:23 AMINFO: HOME -> ./
4/7/2016 11:44:23 AMINFO: HOME -> ./etc/
4/7/2016 11:44:23 AMINFO: HOME -> ./etc/cattle/
4/7/2016 11:44:23 AMINFO: HOME -> ./etc/cattle/startup-env
4/7/2016 11:44:23 AMINFO: ROOT -> ./
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/init.d/
4/7/2016 11:44:23 AMINFO: ROOT -> ./etc/init.d/agent-instance-startup
4/7/2016 11:44:23 AMINFO: Sending agent-instance-startup applied 1-f070335eb50044157e1f00cae4fcf910fa7ca481d1658d2aa0c317dbe810d7d8
4/7/2016 11:44:23 AMmonit: generated unique Monit id 67047b2932294d7336768509be974c36 and stored to '/var/lib/monit/id'
4/7/2016 11:44:23 AMStarting monit daemon with http interface at [localhost:2812]