train-ticket: deploy pod "nacos-0" Init:CrashLoopBackOff

Summary

deploy pod “nacos-0” Init:CrashLoopBackOff

Expected behaviour

pod nacos-0 should run

Current behaviour

deploy pod “nacos-0” Init:CrashLoopBackOff

Steps to reproduce

make depoly

[root@ip-172-31-27-85 train-ticket]# make deploy
args num: 2
Parse DeployArgs
Start deployment Step <1/3>------------------------------------
Start to deploy mysql cluster for nacos.
NAME: nacosdb
LAST DEPLOYED: Sat Jan 28 06:21:26 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The cluster is comprised of 3 pods: 1 leader and 2 followers. Each instance is accessible within the cluster through:

    <pod-name>.nacosdb-mysql

To connect to your database:

1. Get mysql user `nacos`'s password:

    kubectl get secret -n default nacosdb-mysql -o jsonpath="{.data.mysql-password}" | base64 --decode; echo

2. Run an Ubuntu pod that you can use as a client:

    kubectl run ubuntu -n default --image=ubuntu:focal -it --rm --restart='Never' -- bash -il

3. Install the mysql client:

    apt-get update && apt-get install mysql-client -y

4. To connect to leader service in the Ubuntu pod:

    mysql -h nacosdb-mysql-leader -u nacos -p

5. To connect to follower service (read-only) in the Ubuntu pod:

    mysql -h nacosdb-mysql-follower -u nacos -p
Waiting for mysql cluster of nacos to be ready ......
Waiting for 3 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 1 pods to be ready...
partitioned roll out complete: 3 new pods have been updated...
Start to deploy nacos.
NAME: nacos
LAST DEPLOYED: Sat Jan 28 06:30:13 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
Waiting for nacos to be ready ......
Waiting for 3 pods to be ready...

Your environment

OS(e.g: cat /etc/os-release): ubuntu 22.04

Kubernetes version(use kubectl version):

[root@ip-172-31-27-85 ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-08T10:15:02Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-08T10:08:09Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

Additional context

[root@ip-172-31-27-85 ~]# kubectl get po -o wide
NAME              READY   STATUS                  RESTARTS      AGE     IP               NODE                                           NOMINATED NODE   READINESS GATES
nacos-0           0/1     Init:CrashLoopBackOff   4 (60s ago)   2m53s   100.85.89.134    ip-172-31-21-164.cn-north-1.compute.internal   <none>           <none>
nacosdb-mysql-0   3/3     Running                 0             11m     100.86.8.69      ip-172-31-26-198.cn-north-1.compute.internal   <none>           <none>
nacosdb-mysql-1   3/3     Running                 0             8m49s   100.85.89.133    ip-172-31-21-164.cn-north-1.compute.internal   <none>           <none>
nacosdb-mysql-2   3/3     Running                 0             5m42s   100.89.137.138   ip-172-31-18-50.cn-north-1.compute.internal    <none>           <none>
[root@ip-172-31-27-85 ~]# kubectl describe po nacos-0
Name:             nacos-0
Namespace:        default
Priority:         0
Service Account:  default
Node:             ip-172-31-21-164.cn-north-1.compute.internal/172.31.21.164
Start Time:       Sat, 28 Jan 2023 06:30:13 +0000
Labels:           app=nacos
                  controller-revision-hash=nacos-95879c94d
                  statefulset.kubernetes.io/pod-name=nacos-0
Annotations:      cni.projectcalico.org/containerID: 244a57d5d7b7eaa12bb99dc0845034b5d67fb2f88960f6f3e6e13a23f8c546e7
                  cni.projectcalico.org/podIP: 100.85.89.134/32
                  cni.projectcalico.org/podIPs: 100.85.89.134/32
Status:           Pending
IP:               100.85.89.134
IPs:
  IP:           100.85.89.134
Controlled By:  StatefulSet/nacos
Init Containers:
  initmysql:
    Container ID:   containerd://313c55fb9d23014b15e403b181f5ca352ba28901843002a6b415f797c85b89b5
    Image:          codewisdom/mysqlclient:0.1
    Image ID:       docker.io/codewisdom/mysqlclient@sha256:9201e8dfe5eb4e845259730a6046c7b905566119d760ed7d5aef535ace972216
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sat, 28 Jan 2023 06:32:06 +0000
      Finished:     Sat, 28 Jan 2023 06:32:06 +0000
    Ready:          False
    Restart Count:  4
    Environment Variables from:
      nacos-mysql  Secret  Optional: false
    Environment:   <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c6xxs (ro)
Containers:
  k8snacos:
    Container ID:   
    Image:          nacos/nacos-server:2.0.1
    Image ID:       
    Ports:          8848/TCP, 7848/TCP, 9848/TCP, 9849/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     500m
      memory:  1Gi
    Environment Variables from:
      nacos-mysql  Secret  Optional: false
    Environment:
      NACOS_REPLICAS:          3
      NACOS_SERVER_PORT:       8848
      NACOS_APPLICATION_PORT:  8848
      PREFER_HOST_MODE:        hostname
      MODE:                    cluster
      NACOS_SERVERS:           nacos-0.nacos-headless.default.svc.cluster.local:8848 nacos-1.nacos-headless.default.svc.cluster.local:8848 nacos-2.nacos-headless.default.svc.cluster.local:8848
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c6xxs (ro)
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-c6xxs:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  3m2s                 default-scheduler  Successfully assigned default/nacos-0 to ip-172-31-21-164.cn-north-1.compute.internal
  Normal   Pulling    3m1s                 kubelet            Pulling image "codewisdom/mysqlclient:0.1"
  Normal   Pulled     2m48s                kubelet            Successfully pulled image "codewisdom/mysqlclient:0.1" in 13.206324736s
  Normal   Created    69s (x5 over 2m48s)  kubelet            Created container initmysql
  Normal   Started    69s (x5 over 2m48s)  kubelet            Started container initmysql
  Normal   Pulled     69s (x4 over 2m47s)  kubelet            Container image "codewisdom/mysqlclient:0.1" already present on machine
  Warning  BackOff    69s (x9 over 2m46s)  kubelet            Back-off restarting failed container
[root@ip-172-31-27-85 ~]# kubectl logs nacos-0
Defaulted container "k8snacos" out of: k8snacos, initmysql (init)
Error from server (BadRequest): container "k8snacos" in pod "nacos-0" is waiting to start: PodInitializing
[root@ip-172-31-27-85 ~]# kubectl get po -o wide
NAME              READY   STATUS                  RESTARTS        AGE     IP               NODE                                           NOMINATED NODE   READINESS GATES
nacos-0           0/1     Init:CrashLoopBackOff   5 (2m24s ago)   5m39s   100.85.89.134    ip-172-31-21-164.cn-north-1.compute.internal   <none>           <none>
nacosdb-mysql-0   3/3     Running                 0               14m     100.86.8.69      ip-172-31-26-198.cn-north-1.compute.internal   <none>           <none>
nacosdb-mysql-1   3/3     Running                 0               11m     100.85.89.133    ip-172-31-21-164.cn-north-1.compute.internal   <none>           <none>
nacosdb-mysql-2   3/3     Running                 0               8m28s   100.89.137.138   ip-172-31-18-50.cn-north-1.compute.internal    <none>           <none>
[root@ip-172-31-27-85 ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-08T10:15:02Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-08T10:08:09Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

About this issue

Most upvoted comments

Is there any new update in the opened case?

@yinfangchen

creating “breakpoints”

Yes, this is what I do as well. I don’t really need mysql replication for my purposes so I didn’t look carefully at the slave statuses, but I am seeing the same logs as you mention even when creating the “breakpoints”, so I’m not sure that truly solves it. Maybe there is a race so it sometimes appears solved?

Anyway I’m not planning on investigating further since I don’t need replication, but if you’d like to – one thought is to look more closely at how xenon sets up the ‘root’@‘127.0.0.1’ user, and make sure everything is set up the same way for the ‘root’@‘::1’ user. I think this happens here (although it may be different in the version of xenon used in train-ticket), which is where I got the sql statements in the bash script (out of curiosity I did try adding FLUSH PRIVILEGES; and RESET SLAVE ALL; to the end of the script, but it didn’t help). Let me know if you find a solution - I’m curious 😃

@yinfangchen When using the “all_in_one” option for mysql, the shell script needs to be run twice: Once after the nacosdb-mysql cluster starts (i.e. after kubectl rollout status statefulset/$nacosDBRelease-mysql -n $namespace succeeds), and once after the tsdb-mysql cluster starts (i.e. after gen_secret_for_services $tsUser $tsPassword $tsDB "${tsMysqlName}-mysql-leader succeeds)

I ran into this problem on one machine but not another. Patching the label and setting read-only was not enough for me – several of the pods still failed to start with com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure. However, I found a different (hacky) fix that allows all pods to start.

On the failing machine, it looks like xenon fails to elect a leader in the mysql cluster, due to [ERROR] mysql[localhost:3306].ping.error[Error 1045: Access denied for user 'root'@'::1' (using password: NO)].downs:11,downslimits:3 I believe the root cause is the mysql host (localhost) resolves to ::1 on the failing machine but 127.0.0.1 on the succeeding machine, and the root user is created on 127.0.0.1 but not ::1. The failing machine logs [ERROR] mysql[localhost:3306].ping.error[dial tcp [::1]:3306: connect: connection refused].downs:0,downslimits:3 when it is trying to start up, whereas the succeeding machine has [ERROR] mysql[localhost:3306].ping.error[dial tcp 127.0.0.1:3306: connect: connection refused].downs:0,downslimits:3.

This seems to be a known issue in xenon: https://github.com/radondb/radondb-mysql-kubernetes/issues/289, but the fix there of setting mysqlOpts.rootHost doesn’t work since train-ticket uses the helm version of xenon, not the operator. Instead, after starting the nacosdb-mysql cluster I manually added the root@::1 user to mysql and restarted xenon. That is:

for pod in $(kubectl get pods --no-headers -o custom-columns=":metadata.name" | grep nacosdb-mysql); do 
  kubectl exec $pod -- mysql -uroot -e "CREATE USER 'root'@'::1' IDENTIFIED WITH mysql_native_password BY '' ; GRANT ALL ON *.* TO 'root'@'::1' WITH GRANT OPTION ;"
  kubectl exec $pod -c xenon -- /sbin/reboot
done

I do similarly for the tsdb-mysql pods before starting the services. After this, xenon elects a leader and all pods start successfully.

Hi,@Deep-Yellow. It seems like this problem same as #246 and #263. When I run kubectl describe pod nacos-0 and get the result:

....
....
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m46s                  default-scheduler  Successfully assigned default/nacos-0 to master102
  Normal   Pulled     5m9s (x5 over 6m46s)   kubelet            Container image "codewisdom/mysqlclient:0.1" already present on machine
  Normal   Created    5m9s (x5 over 6m46s)   kubelet            Created container initmysql
  Normal   Started    5m9s (x5 over 6m46s)   kubelet            Started container initmysql
  Warning  BackOff    100s (x24 over 6m42s)  kubelet            Back-off restarting failed container initmysql in pod nacos-0_default(e77616fc-7854-4c8d-bbc4-35852061e6c5)

and My openEBS status is as follows(run kubectl get pods -n openebs):

# kubectl get pods -n openebs
NAME                                           READY   STATUS    RESTARTS   AGE
openebs-localpv-provisioner-697c988cc5-6t5vp   1/1     Running   0          3h32m
openebs-ndm-cluster-exporter-87f764699-zl2gg   1/1     Running   0          3h32m
openebs-ndm-kkj6z                              1/1     Running   0          3h32m
openebs-ndm-node-exporter-n5rd8                1/1     Running   0          3h32m
openebs-ndm-operator-5b984f4966-xhx4m          1/1     Running   0          3h32m

I try to check the log for more information.Here are the result:

# kubectl logs nacos-0
Defaulted container "k8snacos" out of: k8snacos, initmysql (init)
Error from server (BadRequest): container "k8snacos" in pod "nacos-0" is waiting to start: PodInitializing

# kubectl logs nacos-0 -c initmysql
ERROR 2002 (HY000): Can't connect to MySQL server on 'nacosdb-mysql-leader' (115)

I guess this problem is related with pvc or pv,so I check it.

# kubectl get pvc
NAME                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
data-nacosdb-mysql-0   Bound    pvc-e37d3b1c-2f5b-4acd-b62c-131be591c0e9   1Gi        RWO            openebs-hostpath   82m
data-nacosdb-mysql-1   Bound    pvc-a3c64b87-b307-4e31-9a4c-7496146e06d4   1Gi        RWO            openebs-hostpath   81m
data-nacosdb-mysql-2   Bound    pvc-5b4948cd-96a0-4d5b-b7e9-0a76e5493c82   1Gi        RWO            openebs-hostpath   80m

# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                          STORAGECLASS       REASON   AGE
pvc-5b4948cd-96a0-4d5b-b7e9-0a76e5493c82   1Gi        RWO            Delete           Bound    default/data-nacosdb-mysql-2   openebs-hostpath            80m
pvc-a3c64b87-b307-4e31-9a4c-7496146e06d4   1Gi        RWO            Delete           Bound    default/data-nacosdb-mysql-1   openebs-hostpath            81m
pvc-e37d3b1c-2f5b-4acd-b62c-131be591c0e9   1Gi        RWO            Delete           Bound    default/data-nacosdb-mysql-0   openebs-hostpath            82m

It seems work fine.


My env: centos7.9 k8s:1.26.2 RuntimeName: containerd RuntimeVersion: v1.6.14 only one node.