rancher: Some of the LB instances stuck in "Initializing" state sometimes.
Server version - v0.51.0 Set up had 8 hosts
Started a LB service that is global. Out of the 8 lb instances , 2 of the instances are stuck in “Initializing” state.
Following errors seen in the container logs for the agents corresponding to the containers that are stuck in “initializing” state.
1/20/2016 10:54:22 AM[config.sh:118] info Downloading http://104.197.236.168:8080/v1//configcontent//services current=
1/20/2016 10:54:22 AM[scripts.sh:19] echo INFO: Downloading http://104.197.236.168:8080/v1//configcontent//services current=
1/20/2016 10:54:22 AMINFO: Downloading http://104.197.236.168:8080/v1//configcontent//services current=
1/20/2016 10:54:22 AM[config.sh:120] get 'http://104.197.236.168:8080/v1//configcontent//services?current='
1/20/2016 10:54:22 AM[scripts.sh:43] '[' 'http://104.197.236.168:8080/v1//configcontent//services?current=' = --no-auth ']'
1/20/2016 10:54:22 AM[scripts.sh:47] call_curl -L 'http://104.197.236.168:8080/v1//configcontent//services?current='
1/20/2016 10:54:22 AM[scripts.sh:29] local 'curl=curl -s --connect-timeout 20'
1/20/2016 10:54:22 AM[scripts.sh:30] '[' '' = false ']'
1/20/2016 10:54:22 AM[scripts.sh:32] '[' -n 'Basic QjdERDZDNTFEM0MxRDc0NEY4MzM6akdxUWdoYlc5aUVYUWlQcTJIVWVXZUtaYk50cFZKbm9CVUc2Vm1nRg==' ']'
1/20/2016 10:54:22 AM[scripts.sh:33] curl -s --connect-timeout 20 -H 'Authorization: Basic QjdERDZDNTFEM0MxRDc0NEY4MzM6akdxUWdoYlc5aUVYUWlQcTJIVWVXZUtaYk50cFZKbm9CVUc2Vm1nRg==' -L 'http://104.197.236.168:8080/v1//configcontent//services?current='
1/20/2016 10:54:42 AM[scripts.sh:1] cleanup
1/20/2016 10:54:42 AM[config.sh:9] EXIT=28
1/20/2016 10:54:42 AM[config.sh:11] '[' -e /var/lib/cattle/download.p82mGiY ']'
1/20/2016 10:54:42 AM[config.sh:12] rm -rf /var/lib/cattle/download.p82mGiY
1/20/2016 10:54:42 AM[config.sh:15] return 28
1/20/2016 10:54:42 AM
The system is going down NOW!
1/20/2016 10:54:42 AM
Sent SIGTERM to all processes
1/20/2016 10:54:43 AM
Sent SIGKILL to all processes
1/20/2016 10:54:43 AM
Requesting system reboot
rancher-server logs:
2016-01-20 19:15:08,409 INFO [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [services], not in sync requested [1] != applied [-1]
2016-01-20 19:15:08,409 INFO [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [agent-instance-scripts], not in sync requested [1] != applied [-1]
2016-01-20 19:15:08,410 INFO [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [monit], not in sync requested [1] != applied [-1]
2016-01-20 19:15:08,410 INFO [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [agent-instance-startup], not in sync requested [1] != applied [-1]
2016-01-20 19:15:08,411 INFO [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [haproxy], not in sync requested [3] != applied [-1]
2016-01-20 19:15:08,411 INFO [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Requesting update of item(s) [agent-instance-startup, haproxy] on [agent:21]
2016-01-20 19:15:08,412 INFO [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:21] on [agent-instance-startup], not in sync requested [1] != applied [-1]
2016-01-20 19:15:08,412 INFO [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:21] on [haproxy], not in sync requested [3] != applied [-1]
2016-01-20 19:15:09,441 INFO [:] [] [] [] [cutorService-23] [i.c.p.a.s.s.impl.AgentServiceImpl ] Timeout waiting for response to [delegate.request] id [04408794-c2a6-45e9-8ba8-063cef2d6324]
2016-01-20 19:15:09,441 INFO [:] [] [] [] [cutorService-23] [i.c.p.a.s.s.impl.AgentServiceImpl ] Timeout waiting for response to [config.update] id [90749760-9981-4949-96d4-b11a0d1a4bed]
2016-01-20 19:15:11,441 INFO [:] [] [] [] [cutorService-22] [i.c.p.a.s.s.impl.AgentServiceImpl ] Timeout waiting for response to [delegate.request] id [291b945e-91c9-4ccb-8f6e-6d5e5baa4c3a]
2016-01-20 19:15:11,441 INFO [:] [] [] [] [cutorService-22] [i.c.p.a.s.s.impl.AgentServiceImpl ] Timeout waiting for response to [config.update] id [cadad389-c94c-483c-a5cf-0639f2bb21a6]
2016-01-20 19:15:11,443 INFO [:] [] [] [] [ecutorService-1] [.p.c.v.i.ConfigItemStatusManagerImpl] Timeout updating item(s) [agent-instance-startup, haproxy] on [agent:21]
2016-01-20 19:15:11,443 INFO [:] [] [] [] [ecutorService-1] [.p.c.v.i.ConfigItemStatusManagerImpl] Timeout updating item(s) [agent-instance-startup, haproxy] on [agent:21]
2016-01-20 19:15:11,443 INFO [:] [] [] [] [ecutorService-1] [.p.c.v.i.ConfigItemStatusManagerImpl] Timeout updating item(s) [agent-instance-startup, haproxy] on [agent:21]
2016-01-20 19:15:11,443 INFO [:] [] [] [] [ecutorService-1] [.p.c.v.i.ConfigItemStatusManagerImpl] Timeout updating item(s) [agent-instance-startup, haproxy] on [agent:21]
2016-01-20 19:15:12,607 ERROR [:] [] [] [] [ecutorService-2] [.p.c.v.i.ConfigItemStatusManagerImpl] Failed null, exit code [1] output [nsenter: failed to execute /var/lib/cattle/events/config.update: No such file or directory
]
2016-01-20 19:15:23,428 INFO [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Requesting update of item(s) [services, agent-instance-scripts, monit, agent-instance-startup, haproxy] on [agent:23]
2016-01-20 19:15:23,430 INFO [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [services], not in sync requested [1] != applied [-1]
2016-01-20 19:15:23,430 INFO [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [agent-instance-scripts], not in sync requested [1] != applied [-1]
2016-01-20 19:15:23,430 INFO [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [monit], not in sync requested [1] != applied [-1]
2016-01-20 19:15:23,431 INFO [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [agent-instance-startup], not in sync requested [1] != applied [-1]
2016-01-20 19:15:23,431 INFO [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:23] on [haproxy], not in sync requested [3] != applied [-1]
2016-01-20 19:15:23,432 INFO [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Requesting update of item(s) [agent-instance-startup, haproxy] on [agent:21]
2016-01-20 19:15:23,433 INFO [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:21] on [agent-instance-startup], not in sync requested [1] != applied [-1]
2016-01-20 19:15:23,433 INFO [:] [] [] [] [ecutorService-7] [.p.c.v.i.ConfigItemStatusManagerImpl] Waiting on [agent:21] on [haproxy], not in sync requested [3] != applied [-1]
2016-01-20 19:15:25,444 INFO [:] [] [] [] [cutorService-12] [i.c.p.a.s.s.impl.AgentServiceImpl ] Timeout waiting for response to [config.update] id [90749760-9981-4949-96d4-b11a0d1a4bed]
2016-01-20 19:15:25,444 INFO [:] [] [] [] [cutorService-12] [i.c.p.a.s.s.impl.AgentServiceImpl ] Timeout waiting for response to [delegate.request] id [4f0895d7-5498-46e8-84a3-dfd11a8800fb]
2016-01-20 19:15:28,615 ERROR [:] [] [] [] [cutorService-10] [.p.c.v.i.ConfigItemStatusManagerImpl] Failed null, exit code [1] output [nsenter: failed to execute /var/lib/cattle/events/config.update: No such file or directory
]
mysql> select * from service_event where instance_id=34;
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
| id | name | account_id | kind | uuid | description | state | created | removed | remove_time | data | host_id | healthcheck_uuid | instance_id | healthcheck_instance_id | reported_health | external_timestamp |
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
| 124 | NULL | 5 | serviceEvent | 61017225-e65d-46ce-aeff-0b3e3d3fbdbc | NULL | created | 2016-01-20 18:52:57 | NULL | NULL | {} | 6 | e3355456-cd42-403d-9493-3ba8cb8b228d_1 | 34 | 12 | DOWN | 1453315977 |
| 149 | NULL | 5 | serviceEvent | 7b15d035-123e-4e5e-8dbe-27c0fbac38cf | NULL | created | 2016-01-20 18:54:26 | NULL | NULL | {} | 7 | 0784e862-3f8d-4bef-9529-f0954a981054_1 | 34 | 12 | DOWN | 1453316066 |
| 159 | NULL | 5 | serviceEvent | 737ff984-e00c-47df-8d2a-36109fc798ae | NULL | created | 2016-01-20 18:54:40 | NULL | NULL | {} | 8 | a32d5bcb-67af-4860-86f3-5d9298862fa5_1 | 34 | 12 | DOWN | 1453316080 |
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
3 rows in set (0.00 sec)
mysql> select * from service_event where instance_id=36;
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
| id | name | account_id | kind | uuid | description | state | created | removed | remove_time | data | host_id | healthcheck_uuid | instance_id | healthcheck_instance_id | reported_health | external_timestamp |
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
| 121 | NULL | 5 | serviceEvent | 08da7fab-134a-4faa-bd5d-1b5361c91637 | NULL | created | 2016-01-20 18:52:55 | NULL | NULL | {} | 4 | 4e16fedc-95a8-4bb5-add8-f3cd67821f6b_1 | 36 | 15 | DOWN | 1453315975 |
| 154 | NULL | 5 | serviceEvent | 9c3c5d2c-9f10-441c-9849-2cc1c0bd0209 | NULL | created | 2016-01-20 18:54:28 | NULL | NULL | {} | 7 | a525a7c7-1ea5-4428-a79f-7b93bbc188c8_1 | 36 | 15 | DOWN | 1453316068 |
| 161 | NULL | 5 | serviceEvent | 85f262c9-aae6-46bc-961e-6699d9ad5fae | NULL | created | 2016-01-20 18:54:40 | NULL | NULL | {} | 8 | 8c83e7c6-5216-4d5a-aafc-26a3d9e0126c_1 | 36 | 15 | DOWN | 1453316080 |
+-----+------+------------+--------------+--------------------------------------+-------------+---------+---------------------+---------+-------------+------+---------+----------------------------------------+-------------+-------------------------+-----------------+--------------------+
3 rows in set (0.00 sec)
mysql>
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 1
- Comments: 15
@demohi The
ipcrequest has been updated.As for your LB issues, do you have more than 1 host running? Are you able to ping from inside the network agent from one host to the another host’s network agent’s IP (10.42.x.x)? Exec into one of the network agent and try to ping. If it’s not working, you are having cross host communication issues.
http://docs.rancher.com/rancher/faqs/#cross-host-communication
+1
LB stuck in “Initializing” state, restart not works
v1.0.0Cattle:v0.159.2UI:v0.100.3Disabledenvironment.codeI create a stack by a docker-compose config file, I found the ipc options is ignored in rancher.