crc: Code Ready Containers becomes 'Not Reachable' due to etcd crashing and unable to restart
General information
- OS: Linux
- Hypervisor: KVM
- Did you run
crc setupbefore starting it (Yes/No)? Yes
CRC version
# Put the output of `crc version`
CodeReady Containers version: 1.21.0+68a4cdd7
OpenShift version: 4.6.9 (embedded in executable)
CRC status
# Put the output of `crc status`
CRC VM: Running
OpenShift: Not Reachable (v4.6.9)
Disk Usage: 25.71GB of 74.6GB (Inside the CRC VM)
Cache Usage: 27.04GB
Cache Directory: /home/crcuser/.crc/cache
CRC config
# Put the output of `crc config view`
- consent-telemetry : no
- cpus : 12
- disk-size : 70
- enable-cluster-monitoring : true
- memory : 48000
Host Operating System
# Put the output of `cat /etc/os-release` in case of Linux
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
Steps to reproduce
- Start CRC, leave it running
- After 1-7 days (perhaps more… unclear)
- CRC OpenShift API stops responding and status shows ‘Not Reachable’
- Some container workloads (e.g. other pods, services, routes for applications) stay operational
- Stopping / restarting
crcdoes not recover
Expected
CRC OpenShift API functions allowing oc login and other actions on the cluster against the API:
oc login -u developer -p developer
Login successful.
You have one project on this server: "victim"
Using project "victim".
Actual
Unable to login with cli or web console:
oc login -u developer -p developer
error: dial tcp 192.168.130.11:6443: connect: connection refused - verify you have provided the correct host and port and that the server is currently running.
Logs
You can start crc with crc start --log-level debug to collect logs.
Link to gist with logs
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 37 (16 by maintainers)
figured it out:
I will keep this instance running and monitor status. May take a little while to see any clear signs one way or another.
I experimented a bit and found out that with cobra, only the the last parameter is accepted. It makes node-ip is always empty in crc.
kubelet code is then doing a lookup to get the IP. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/nodestatus/setters.go#L225
Then, it is compared to the list of all interfaces to see if it’s valid (
net.InterfaceAddrs()). Can you give me the output ofifconfigandroute -n? If it doesn’t match, it picks the interface of the default gateway.I guess we fall in the last case.
Hi @timroster , The release is almost out. Can you try with http://mirror.openshift.com/pub/openshift-v4/clients/crc/1.22.0/ ? Thanks.
I think I possibly found the issue. We define 2 times the same parameter node-ip in /etc/systemd/system/kubelet.service
I don’t know where this KUBELET_NODE_IP env variable is coming but it is definitely suspect!
The second parameter was introduced in OpenShift 4.6. https://github.com/openshift/machine-config-operator/commit/0b1b2d5b10751e41af79d2d75705ca03589a1f7e