linkerd2: Pod deletion stuck in TERMINATING

Bug Report

What is the issue?

We have a stateful set (Kafka, probably should not have meshed it, I’m new to Linkerd) in a bad state, where trying to delete a pod gets stuck forever. The only way around it was delete pod foo --grace-period=0 --force or to ssh into the actual node and run docker stop on the linkerd-proxy container.

How can it be reproduced?

I don’t know unfortunately.

Logs, error output, etc

The linkerd proxy logs: https://gist.github.com/Enrico2/6ad11da43fa53bd7bee1034449bfe53a although those seem to have stopped fairly quickly.

linkerd check output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

Status check results are √

Environment

  • Kubernetes Version: 1.13.11-gke.14
  • Cluster Environment: GKE
  • Host OS: linux
  • Linkerd version: 2.6.0

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 32 (16 by maintainers)

Most upvoted comments

I see that @grampelberg has tagged this as a bug; we will look into it. Thanks!