rancher: Cannot execute shell on CentOS 7.2
Rancher Version: v1.2.0 Docker Version: 1.12.3 OS and where are the hosts located? (cloud, bare metal, etc): CentOS 7.2, Digitalocean (but also on-prem) Setup Details: (single node rancher vs. HA rancher, internal DB vs. external DB) single node rancher, internal DB Environment Type: (Cattle/Kubernetes/Swarm/Mesos) Cattle Steps to Reproduce:
- Install Rancher by starting the
rancher/server
container, and join arancher/agent
container - Wait for infrastructure stack to become active, Execute shell on the
scheduler
container
Results: Popup opens and closes immediately Expected: Popup with a running shell
More info: Tried to debug this today. Haven’t gotten the root cause but filing a ticket early. Tried to reproduce on Ubuntu 16.04, it’s working fine there. This is what I get on the commandline:
[root@centos-02 ~]# docker ps | grep scheduler
a49f85151ae2 rancher/scheduler:v0.4.0 "/.r/r scheduler" 14 minutes ago Up 14 minutes r-scheduler-scheduler-1-5ed5a22e
[root@centos-02 ~]# docker exec -ti a49 bash
rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: cannot exec a container that has run and stopped\n"
This was fixed in https://github.com/docker/docker/issues/27540 but still appears in this situation. As I dug a little deeper, I found out that only the containers with a Path of /.r/r
seem to be affected. For a clean install this is:
22b7df37bf0c rancher/healthcheck:v0.1.0 "/.r/r /tini -- healt" 14 minutes ago Up 14 minutes r-healthcheck-healthcheck-1-3ccae477
a49f85151ae2 rancher/scheduler:v0.4.0 "/.r/r scheduler" 15 minutes ago Up 15 minutes r-scheduler-scheduler-1-5ed5a22e
701043eecb1c rancher/net:v0.7.5 "/.r/r start.sh" 15 minutes ago Up 13 minutes r-ipsec-ipsec-1-48f985f7
If you try to diagnose this the same way as in the Docker ticket, you get the same results. For instance for the scheduler:
[root@centos-02 ~]# docker ps | grep scheduler
a49f85151ae2 rancher/scheduler:v0.4.0 "/.r/r scheduler" 17 minutes ago Up 17 minutes r-scheduler-scheduler-1-5ed5a22e
[root@centos-02 ~]# ps -ef | grep a49f
root 3126 9121 0 19:40 pts/0 00:00:00 grep --color=auto a49f
root 11291 9280 0 19:23 ? 00:00:00 docker-containerd-shim a49f85151ae25475d1d77945f03e5e1862f1d964b1d5f03b02ea61b31af03628 /var/run/docker/libcontainerd/a49f85151ae25475d1d77945f03e5e1862f1d964b1d5f03b02ea61b31af03628 docker-runc
[root@centos-02 ~]# docker-runc state a49f85151ae25475d1d77945f03e5e1862f1d964b1d5f03b02ea61b31af03628
{
"ociVersion": "1.0.0-rc2-dev",
"id": "a49f85151ae25475d1d77945f03e5e1862f1d964b1d5f03b02ea61b31af03628",
"pid": 0,
"status": "stopped",
"bundle": "/run/docker/libcontainerd/a49f85151ae25475d1d77945f03e5e1862f1d964b1d5f03b02ea61b31af03628",
"rootfs": "/var/lib/docker/devicemapper/mnt/e506c366628e380183932d047e7d4b9e60af64d59e3ab977cf13576df7c1ae17/rootfs",
"created": "2016-12-01T19:23:21.019392097Z"
}
Haven’t had the time to go deeper, I need a way to reproduce on vanilla Docker to open a new issue there, maybe someone can help out.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 26 (9 by maintainers)
@superseb @tobowers For docker@1.3.0 we can update kernel to
3.10.0-514.6.1.el7.x86_64
first, it works for me.For sake of completeness, I can successfully open shell with Rancher v1.2.2 on
CentOS Linux release 7.3.1611 (Core)
with stock kernel3.10.0-514.2.2.el7.x86_64
.Ok quick update of something I just thought of checking out as it only happens on CentOS7. As it seems that there is some mismatch in /proc or in combination with runC lookups I wondered about kernel version. CentOS/RHEL always keep their base version the same, and backport updates. So I updated the 3.10 kernel to the kernel-ml from elrepo which contains
4.8.12-1.el7.elrepo.x86_64
, rebooted and this fixed the problem. On containers I cannot exec into on3.10
I can exec into on4.8.12-1.el7.elrepo.x86_64
. I know this is not a real solution for production machines but it might help in debugging to find the root cause, as I just thought of this now.