train-ticket: deploy pod "nacos-0" Init:CrashLoopBackOff
Summary
deploy pod “nacos-0” Init:CrashLoopBackOff
Expected behaviour
pod nacos-0 should run
Current behaviour
deploy pod “nacos-0” Init:CrashLoopBackOff
Steps to reproduce
make depoly
[root@ip-172-31-27-85 train-ticket]# make deploy
args num: 2
Parse DeployArgs
Start deployment Step <1/3>------------------------------------
Start to deploy mysql cluster for nacos.
NAME: nacosdb
LAST DEPLOYED: Sat Jan 28 06:21:26 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The cluster is comprised of 3 pods: 1 leader and 2 followers. Each instance is accessible within the cluster through:
<pod-name>.nacosdb-mysql
To connect to your database:
1. Get mysql user `nacos`'s password:
kubectl get secret -n default nacosdb-mysql -o jsonpath="{.data.mysql-password}" | base64 --decode; echo
2. Run an Ubuntu pod that you can use as a client:
kubectl run ubuntu -n default --image=ubuntu:focal -it --rm --restart='Never' -- bash -il
3. Install the mysql client:
apt-get update && apt-get install mysql-client -y
4. To connect to leader service in the Ubuntu pod:
mysql -h nacosdb-mysql-leader -u nacos -p
5. To connect to follower service (read-only) in the Ubuntu pod:
mysql -h nacosdb-mysql-follower -u nacos -p
Waiting for mysql cluster of nacos to be ready ......
Waiting for 3 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 1 pods to be ready...
partitioned roll out complete: 3 new pods have been updated...
Start to deploy nacos.
NAME: nacos
LAST DEPLOYED: Sat Jan 28 06:30:13 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
Waiting for nacos to be ready ......
Waiting for 3 pods to be ready...
Your environment
OS(e.g: cat /etc/os-release):
ubuntu 22.04
Kubernetes version(use kubectl version):
[root@ip-172-31-27-85 ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-08T10:15:02Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-08T10:08:09Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Additional context
[root@ip-172-31-27-85 ~]# kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nacos-0 0/1 Init:CrashLoopBackOff 4 (60s ago) 2m53s 100.85.89.134 ip-172-31-21-164.cn-north-1.compute.internal <none> <none>
nacosdb-mysql-0 3/3 Running 0 11m 100.86.8.69 ip-172-31-26-198.cn-north-1.compute.internal <none> <none>
nacosdb-mysql-1 3/3 Running 0 8m49s 100.85.89.133 ip-172-31-21-164.cn-north-1.compute.internal <none> <none>
nacosdb-mysql-2 3/3 Running 0 5m42s 100.89.137.138 ip-172-31-18-50.cn-north-1.compute.internal <none> <none>
[root@ip-172-31-27-85 ~]# kubectl describe po nacos-0
Name: nacos-0
Namespace: default
Priority: 0
Service Account: default
Node: ip-172-31-21-164.cn-north-1.compute.internal/172.31.21.164
Start Time: Sat, 28 Jan 2023 06:30:13 +0000
Labels: app=nacos
controller-revision-hash=nacos-95879c94d
statefulset.kubernetes.io/pod-name=nacos-0
Annotations: cni.projectcalico.org/containerID: 244a57d5d7b7eaa12bb99dc0845034b5d67fb2f88960f6f3e6e13a23f8c546e7
cni.projectcalico.org/podIP: 100.85.89.134/32
cni.projectcalico.org/podIPs: 100.85.89.134/32
Status: Pending
IP: 100.85.89.134
IPs:
IP: 100.85.89.134
Controlled By: StatefulSet/nacos
Init Containers:
initmysql:
Container ID: containerd://313c55fb9d23014b15e403b181f5ca352ba28901843002a6b415f797c85b89b5
Image: codewisdom/mysqlclient:0.1
Image ID: docker.io/codewisdom/mysqlclient@sha256:9201e8dfe5eb4e845259730a6046c7b905566119d760ed7d5aef535ace972216
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 28 Jan 2023 06:32:06 +0000
Finished: Sat, 28 Jan 2023 06:32:06 +0000
Ready: False
Restart Count: 4
Environment Variables from:
nacos-mysql Secret Optional: false
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c6xxs (ro)
Containers:
k8snacos:
Container ID:
Image: nacos/nacos-server:2.0.1
Image ID:
Ports: 8848/TCP, 7848/TCP, 9848/TCP, 9849/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 500m
memory: 1Gi
Environment Variables from:
nacos-mysql Secret Optional: false
Environment:
NACOS_REPLICAS: 3
NACOS_SERVER_PORT: 8848
NACOS_APPLICATION_PORT: 8848
PREFER_HOST_MODE: hostname
MODE: cluster
NACOS_SERVERS: nacos-0.nacos-headless.default.svc.cluster.local:8848 nacos-1.nacos-headless.default.svc.cluster.local:8848 nacos-2.nacos-headless.default.svc.cluster.local:8848
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c6xxs (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-c6xxs:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m2s default-scheduler Successfully assigned default/nacos-0 to ip-172-31-21-164.cn-north-1.compute.internal
Normal Pulling 3m1s kubelet Pulling image "codewisdom/mysqlclient:0.1"
Normal Pulled 2m48s kubelet Successfully pulled image "codewisdom/mysqlclient:0.1" in 13.206324736s
Normal Created 69s (x5 over 2m48s) kubelet Created container initmysql
Normal Started 69s (x5 over 2m48s) kubelet Started container initmysql
Normal Pulled 69s (x4 over 2m47s) kubelet Container image "codewisdom/mysqlclient:0.1" already present on machine
Warning BackOff 69s (x9 over 2m46s) kubelet Back-off restarting failed container
[root@ip-172-31-27-85 ~]# kubectl logs nacos-0
Defaulted container "k8snacos" out of: k8snacos, initmysql (init)
Error from server (BadRequest): container "k8snacos" in pod "nacos-0" is waiting to start: PodInitializing
[root@ip-172-31-27-85 ~]# kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nacos-0 0/1 Init:CrashLoopBackOff 5 (2m24s ago) 5m39s 100.85.89.134 ip-172-31-21-164.cn-north-1.compute.internal <none> <none>
nacosdb-mysql-0 3/3 Running 0 14m 100.86.8.69 ip-172-31-26-198.cn-north-1.compute.internal <none> <none>
nacosdb-mysql-1 3/3 Running 0 11m 100.85.89.133 ip-172-31-21-164.cn-north-1.compute.internal <none> <none>
nacosdb-mysql-2 3/3 Running 0 8m28s 100.89.137.138 ip-172-31-18-50.cn-north-1.compute.internal <none> <none>
[root@ip-172-31-27-85 ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-08T10:15:02Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-08T10:08:09Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 15
Is there any new update in the opened case?
@yinfangchen
Yes, this is what I do as well. I don’t really need mysql replication for my purposes so I didn’t look carefully at the slave statuses, but I am seeing the same logs as you mention even when creating the “breakpoints”, so I’m not sure that truly solves it. Maybe there is a race so it sometimes appears solved?
Anyway I’m not planning on investigating further since I don’t need replication, but if you’d like to – one thought is to look more closely at how xenon sets up the ‘root’@‘127.0.0.1’ user, and make sure everything is set up the same way for the ‘root’@‘::1’ user. I think this happens here (although it may be different in the version of xenon used in train-ticket), which is where I got the sql statements in the bash script (out of curiosity I did try adding
FLUSH PRIVILEGES;andRESET SLAVE ALL;to the end of the script, but it didn’t help). Let me know if you find a solution - I’m curious 😃@yinfangchen When using the “all_in_one” option for mysql, the shell script needs to be run twice: Once after the nacosdb-mysql cluster starts (i.e. after
kubectl rollout status statefulset/$nacosDBRelease-mysql -n $namespacesucceeds), and once after the tsdb-mysql cluster starts (i.e. aftergen_secret_for_services $tsUser $tsPassword $tsDB "${tsMysqlName}-mysql-leadersucceeds)I ran into this problem on one machine but not another. Patching the label and setting read-only was not enough for me – several of the pods still failed to start with
com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure. However, I found a different (hacky) fix that allows all pods to start.On the failing machine, it looks like xenon fails to elect a leader in the mysql cluster, due to
[ERROR] mysql[localhost:3306].ping.error[Error 1045: Access denied for user 'root'@'::1' (using password: NO)].downs:11,downslimits:3I believe the root cause is the mysql host (localhost) resolves to::1on the failing machine but127.0.0.1on the succeeding machine, and the root user is created on127.0.0.1but not::1. The failing machine logs[ERROR] mysql[localhost:3306].ping.error[dial tcp [::1]:3306: connect: connection refused].downs:0,downslimits:3when it is trying to start up, whereas the succeeding machine has[ERROR] mysql[localhost:3306].ping.error[dial tcp 127.0.0.1:3306: connect: connection refused].downs:0,downslimits:3.This seems to be a known issue in xenon: https://github.com/radondb/radondb-mysql-kubernetes/issues/289, but the fix there of setting
mysqlOpts.rootHostdoesn’t work since train-ticket uses the helm version of xenon, not the operator. Instead, after starting the nacosdb-mysql cluster I manually added theroot@::1user to mysql and restarted xenon. That is:I do similarly for the tsdb-mysql pods before starting the services. After this, xenon elects a leader and all pods start successfully.
Hi,@Deep-Yellow. It seems like this problem same as #246 and #263. When I run
kubectl describe pod nacos-0and get the result:and My openEBS status is as follows(run
kubectl get pods -n openebs):I try to check the log for more information.Here are the result:
I guess this problem is related with pvc or pv,so I check it.
It seems work fine.
My env: centos7.9 k8s:1.26.2 RuntimeName: containerd RuntimeVersion: v1.6.14 only one node.