kubernetes: Couldn't find network status for {namespace}/{pod_name} through plugin: invalid network status for

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): NO

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): “invalid network status for” “Couldn’t find network status for”


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version): 1.6.0

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): NAME=“Ubuntu” VERSION=“14.04.5 LTS, Trusty Tahr” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 14.04.5 LTS” VERSION_ID=“14.04”
  • Kernel (e.g. uname -a): Linux HOSTNAME_REDACTED 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Others:

What happened: After upgrading a cluster from 1.5.3 to 1.6.0, I see errors like the following in kubelet’s log: /var/log/upstart/kubelet.log:W0403 15:38:19.584738 25905 docker_sandbox.go:263] Couldn't find network status for default/prometheus-node-exporter-2n784 through plugin: invalid network status for

What you expected to happen:

  • For this error to not be generated if it’s a false positive
  • For the underlying bug to be fixed if this is a real error but it’s a bug in kubelet/kubernetes (and not some environmental trigger)
  • For the error message to be more useful in the case the error is legitimate

How to reproduce it (as minimally and precisely as possible): Upgrade a cluster from 1.5.3 to 1.6.0 and watch kubelet’s logs.

Anything else we need to know: I am using a hyperkube image that I built from kubernetes source, and the compiled kubelet binary downloaded from the kubernetes project. This is the same way I deployed 1.5.3 and previous versions.

I am using flanneld as an overlay network and not configuring any CNI networking / network-plugin options to kubelet.

I tracked this error message down to the following code.

pkg/kubelet/dockershim/docker_sandbox.go

229 // getIPFromPlugin interrogates the network plugin for an IP.
230 func (ds *dockerService) getIPFromPlugin(sandbox *dockertypes.ContainerJSON) (string, error) {
231     metadata, err := parseSandboxName(sandbox.Name)
232     if err != nil {
233         return "", err
234     }
235     msg := fmt.Sprintf("Couldn't find network status for %s/%s through plugin", metadata.Namespace, metadata.Name)
236     cID := kubecontainer.BuildContainerID(runtimeName, sandbox.ID)
237     networkStatus, err := ds.network.GetPodNetworkStatus(metadata.Namespace, metadata.Name, cID)
238     if err != nil {
239         // This might be a sandbox that somehow ended up without a default
240         // interface (eth0). We can't distinguish this from a more serious
241         // error, so callers should probably treat it as non-fatal.
242         return "", err
243     }
244     if networkStatus == nil {
245         return "", fmt.Errorf("%v: invalid network status for", msg)
246     }
247     return networkStatus.IP.String(), nil
248 }
249 
250 // getIP returns the ip given the output of `docker inspect` on a pod sandbox,
251 // first interrogating any registered plugins, then simply trusting the ip
252 // in the sandbox itself. We look for an ipv4 address before ipv6.
253 func (ds *dockerService) getIP(sandbox *dockertypes.ContainerJSON) (string, error) {
254     if sandbox.NetworkSettings == nil {
255         return "", nil
256     }
257     if sharesHostNetwork(sandbox) {
258         // For sandboxes using host network, the shim is not responsible for
259         // reporting the IP.
260         return "", nil
261     }
262     if IP, err := ds.getIPFromPlugin(sandbox); err != nil {
263         glog.Warningf("%v", err)
264     } else if IP != "" {
265         return IP, nil
266     }
267     // TODO: trusting the docker ip is not a great idea. However docker uses
268     // eth0 by default and so does CNI, so if we find a docker IP here, we
269     // conclude that the plugin must have failed setup, or forgotten its ip.
270     // This is not a sensible assumption for plugins across the board, but if
271     // a plugin doesn't want this behavior, it can throw an error.
272     if sandbox.NetworkSettings.IPAddress != "" {
273         return sandbox.NetworkSettings.IPAddress, nil
274     }
275     return sandbox.NetworkSettings.GlobalIPv6Address, nil
276 }

pkg/kubelet/network/kubenet/kubenet_linux.go

541 // TODO: Use the addToNetwork function to obtain the IP of the Pod. That will assume idempotent ADD call to the plugin.
542 // Also fix the runtime's call to Status function to be done only in the case that the IP is lost, no need to do periodic calls
543 func (plugin *kubenetNetworkPlugin) GetPodNetworkStatus(namespace string, name string, id kubecontainer.ContainerID) (*network.PodNetworkStatus, error) {
544     plugin.mu.Lock()
545     defer plugin.mu.Unlock()
546     // Assuming the ip of pod does not change. Try to retrieve ip from kubenet map first.
547     if podIP, ok := plugin.podIPs[id]; ok {
548         return &network.PodNetworkStatus{IP: net.ParseIP(podIP)}, nil
549     }
550 
551     netnsPath, err := plugin.host.GetNetNS(id.ID)
552     if err != nil {
553         return nil, fmt.Errorf("Kubenet failed to retrieve network namespace path: %v", err)
554     }
555     ip, err := network.GetPodIP(plugin.execer, plugin.nsenterPath, netnsPath, network.DefaultInterfaceName)
556     if err != nil {
557         return nil, err
558     }
559 
560     plugin.podIPs[id] = ip.String()
561     return &network.PodNetworkStatus{IP: ip}, nil
562 }

As suggested by the code, I’m only getting these errors for containers that don’t use hostNetwork. If I run docker inspect on the containers that are mentioned in the error messages, the values in the NetworkSettings section are empty, but I’m not sure that’s relevant.

        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": null,
            "SandboxKey": "",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {}
        }

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 27 (11 by maintainers)

Most upvoted comments

We have basically the same setup as @vdavidoff :

  • Flannel on CoreOS. Flannel gets a subnet from etcd, writes CIDR to properties file
  • dockerd starts with --bip CIDR from flannel properties file
  • No CNI plugin or kubenet or anything for kubelet

We saw the same logs in kubelet logs:

W0429 03:21:28.658370    3252 docker_sandbox.go:263] Couldn't find network status for production/mypod-544180576-3d9hz through plugin: invalid network status for

However, we also saw massive pods being killed by kubelet:

# kubectl describe pod <podname>
  14m	14m	1	kubelet, ip-10-72-21-43.us-west-2.compute.internal	spec.containers{mysvc}	Normal	Killing		Killing container with id docker://9ba866931c13c11a45afe1555d5f0f08a352780f3458973c7888dccc2192f610:Need to kill Pod

This caused disruption and brief service outage. Please advise.

Here are my kubelet logs, showing the same error @jmccarty3 reports: https://gist.github.com/stensonb/3f6db27eb0ba031463d63cad8360f780

It looks like the container with id dadd21ea-2b61-11e7-87ae-e63e85527c63 demonstrates the issue…search for that in the gist.

Also, this line:

April 27th 2017, 10:32:05.000	W0427 17:32:05.507065    2267 docker_sandbox.go:263] Couldn't find network status for techops-prod/cyanite-493774351-tb456 through plugin: invalid network status for

Is anyone else also seeing messages like

Sep 28 07:57:54 ip-10-50-97-190 start-kubelet.sh[1748]: E0928 07:57:54.064121    1748 remote_runtime.go:163] ListPodSandbox with filter "nil" from runtime service failed: rpc error: code = 4 desc = context deadline exceeded
Sep 28 07:57:54 ip-10-50-97-190 start-kubelet.sh[1748]: E0928 07:57:54.064194    1748 kuberuntime_sandbox.go:185] ListPodSandbox failed: rpc error: code = 4 desc = context deadline exceeded
Sep 28 07:57:54 ip-10-50-97-190 start-kubelet.sh[1748]: E0928 07:57:54.064209    1748 generic.go:198] GenericPLEG: Unable to retrieve pods: rpc error: code = 4 desc = context deadline exceeded

when this happens? Looks like some bug in dockershim.

For me it seems to have something to do with flannel v0.7.0 and Kubernetes v1.6.x. My Kubelet starts to act weird, when the rkt garbage collection starts, either reproducible with sudo rkt gc or automatically like:

kubelet[16200]: I0524 08:24:29.761684 16200 qos_container_manager_linux.go:285] [ContainerManager]: Updated QoS cgroup configuration
...
systemd[1]: Starting Garbage Collection for rkt...
rkt[14743]: gc: moving pod "17942a27-cb9f-4152-ab95-fc3bd128639b" to garbage
rkt[14743]: gc: pod "17942a27-cb9f-4152-ab95-fc3bd128639b" not removed: still within grace period (24h0m0s)
systemd[1]: Started Garbage Collection for rkt.
systemd-timesyncd[623]: Network configuration changed, trying to establish connection.
systemd-timesyncd[623]: Synchronized to time server 78.47.226.8:123 (2.coreos.pool.ntp.org).
kubelet[16200]: I0524 08:25:28.998438 16200 container_manager_linux.go:397] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker
.service
kubelet[16200]: I0524 08:25:29.762683 16200 qos_container_manager_linux.go:302] [ContainerManager]: Failed to update QoS cgroup configuration
...

There are indications that v0.7.1 solves some Kubernetes related issues, see: https://github.com/coreos/flannel/pull/690, but since it is currently not shipped with the CoreOS version I use, I just downgraded to v1.5.x again.