amazon-vpc-cni-k8s: New pods failing to start with `FailedCreatePodSandBox` warning for CNI versions 1.7.x with Cilium

What happened:

New pods started failing to come up after upgrading to eks CNI v1.7.0 from v1.6.0. I was able to upgrade to v1.6.3 without any issue. I started to see the errors when I upgraded to 1.7.0. I also tried to upgrade to other version ( v1.7.2 and v1.7.5) but I am seeing the same issue.

Here is the error I am seeing.

 Warning  FailedCreatePodSandBox  28s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7e3423d27fc6f36276de03aa7f41ef6b6f02121f800b65b64b8073c6a207b696" network for pod "spinnaker-get-resource-type-3fc73e4e3611d9f4-ps4b7": networkPlugin cni failed to set up pod "spinnaker-get-resource-type-3fc73e4e3611d9f4-ps4b7_default" network: invalidcharacter '{' after top-level value

Here is the cni log

Anything else we need to know?:

  • We have Cilium running in chaining mode (v1.8.4)

Environment:

  • Kubernetes version :v1.17.9-eks-4c6976
  • CNI Version: Tried different versions but seeing same issue for (1.7.0, 1.7.2, 1.7.5)
  • Kernel: 5.4.58-27.104.amzn2.x86_64

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 22 (10 by maintainers)

Most upvoted comments

Hi,

We have found the RC, for now please add pluginLogFile and pluginLogLevel in 05-cilium.conflist. We will fix this issue in the next release.

cat /etc/cni/net.d/05-cilium.conflist
{
  "cniVersion": "0.3.1",
  "name": "aws-cni",
  "plugins": [
    {
      "name": "aws-cni",
      "type": "aws-cni",
      "vethPrefix": "eni",
      "mtu": "9001",
      "pluginLogFile": "/var/log/aws-routed-eni/plugin.log",
      "pluginLogLevel": "Debug"
    },
    {
       "name": "cilium",
       "type": "cilium-cni",
       "enable-debug": false
    }
  ]
}

I was able to repro and below is the o/p after fixing the conflist -

dev-dsk-varavaj-2b-72f02457 % kubectl describe daemonset aws-node -n kube-system | grep 1.7.5
    Image:      602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.7.5
    Image:      602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.7.5

NAME                       READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
my-nginx-86b7cfc89-jvzvw   1/1     Running   0          18h   192.168.10.206   ip-192-168-0-43.us-west-2.compute.internal     <none>           <none>
my-nginx-86b7cfc89-p4q2t   1/1     Running   0          18m   192.168.67.156   ip-192-168-81-109.us-west-2.compute.internal   <none>           <none>

NAME                               READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
aws-node-95jtw                     1/1     Running   0          23m   192.168.0.43     ip-192-168-0-43.us-west-2.compute.internal     <none>           <none>
aws-node-cnrkq                     1/1     Running   0          24m   192.168.81.109   ip-192-168-81-109.us-west-2.compute.internal   <none>           <none>
aws-node-j64z5                     1/1     Running   0          23m   192.168.51.208   ip-192-168-51-208.us-west-2.compute.internal   <none>           <none>
cilium-5gr4s                       1/1     Running   0          18h   192.168.51.208   ip-192-168-51-208.us-west-2.compute.internal   <none>           <none>
cilium-d4nff                       1/1     Running   0          18h   192.168.0.43     ip-192-168-0-43.us-west-2.compute.internal     <none>           <none>
cilium-node-init-kwsj6             1/1     Running   0          18h   192.168.0.43     ip-192-168-0-43.us-west-2.compute.internal     <none>           <none>
cilium-node-init-pv4jw             1/1     Running   0          18h   192.168.51.208   ip-192-168-51-208.us-west-2.compute.internal   <none>           <none>
cilium-node-init-pxdfv             1/1     Running   0          18h   192.168.81.109   ip-192-168-81-109.us-west-2.compute.internal   <none>           <none>
cilium-operator-6554b44b9d-f88zj   1/1     Running   0          18h   192.168.51.208   ip-192-168-51-208.us-west-2.compute.internal   <none>           <none>
cilium-operator-6554b44b9d-j8tlb   1/1     Running   0          18h   192.168.0.43     ip-192-168-0-43.us-west-2.compute.internal     <none>           <none>
cilium-qg6tf                       1/1     Running   0          18h   192.168.81.109   ip-192-168-81-109.us-west-2.compute.internal   <none>           <none>
coredns-5c97f79574-9nnkk           1/1     Running   0          18h   192.168.68.203   ip-192-168-81-109.us-west-2.compute.internal   <none>           <none>
coredns-5c97f79574-jnsm2           1/1     Running   0          18h   100.64.95.97     ip-192-168-51-208.us-west-2.compute.internal   <none>           <none>
kube-proxy-bmv86                   1/1     Running   0          18h   192.168.81.109   ip-192-168-81-109.us-west-2.compute.internal   <none>           <none>
kube-proxy-j7c8f                   1/1     Running   0          18h   192.168.0.43     ip-192-168-0-43.us-west-2.compute.internal     <none>           <none>
kube-proxy-ss98z                   1/1     Running   0          18h   192.168.51.208   ip-192-168-51-208.us-west-2.compute.internal   <none>           <none>

Thank you!

Thanks for conforming @Aggouri . We are actively looking into the issue. Will update asap.

@jayanthvn I followed this Doc

Thanks @YesemKebede . We will look into it asap.