calico: Calico Operator Installation got stuck
We had a calico installation without the operator with version 3.19.1 and now trying to move to operator installation. I did it on other clusters smoothly but one of the kubernetes cluster that we have, the installation got stuck. There are no errors or logs in anywhere that I can find. I followed this documentation: https://projectcalico.docs.tigera.io/maintenance/operator-migration
Expected Behavior
Calico resources migrated from the kube-system namespace used by the Calico manifests to a new calico-system namespace
Current Behavior
Typha is failed to scale and failed to move the calico-node pods to calico-system namespace. The good thing is calico-node pods are still running in kube-system namespace. There was a calico-typha deployment in the kube-system namespace before the installation. I am suspecting that might be the issue. That deployment has been removed after the Installation got into this stucked state.
Possible Solution
Maybe retriggering the installation would work but how ?
Steps to Reproduce (for bugs)
- Install the Tigera Calico operator and custom resource definitions.
kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
- Trigger the operator to start a migration by creating an Installation resource. The operator will auto-detect your existing Calico settings and fill out the spec section.
kubectl create -f - <<EOF
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec: {}
EOF
- Monitor the migration status with the following command:
# kubectl describe tigerastatus calico
Name: calico
Namespace:
Labels: <none>
Annotations: <none>
API Version: operator.tigera.io/v1
Kind: TigeraStatus
Metadata:
Creation Timestamp: 2022-07-19T12:54:49Z
Generation: 1
Managed Fields:
API Version: operator.tigera.io/v1
Fields Type: FieldsV1
fieldsV1:
f:spec:
f:status:
.:
f:conditions:
Manager: operator
Operation: Update
Time: 2022-07-19T12:54:54Z
Resource Version: 841265704
UID: 7550ded0-0137-4f67-ac9d-6d2409ef0104
Spec:
Status:
Conditions:
Last Transition Time: 2022-07-19T12:54:54Z
Message: not enough linux nodes to schedule typha pods on, require 1 and have 0
Reason: Failed to scale typha
Status: True
Type: Degraded
Events: <none>
- However some typha pods are created
# kubectl get pods -n calico-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-557cd74586-nnrm2 1/1 Running 0 19h
calico-typha-7648f46566-r27jh 1/1 Running 0 19h
calico-typha-7648f46566-w6trf 1/1 Running 0 19h
Your Environment
- Calico version
# calicoctl version
Client Version: v3.23.2
Git commit: a52cb86db
Cluster Version: v3.19.1
Cluster Type: k8s,bgp,kdd,typha
- Orchestrator version (e.g. kubernetes, mesos, rkt):
# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.12", GitCommit:"696a9fdd2a58340e61e0d815c5769d266fca0802", GitTreeState:"clean", BuildDate:"2022-04-13T19:07:00Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.12", GitCommit:"696a9fdd2a58340e61e0d815c5769d266fca0802", GitTreeState:"clean", BuildDate:"2022-04-13T19:01:10Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
- Operating System and version:
Ubuntu 20.04.4 LTS
Linux s941 5.4.0-110-generic #124-Ubuntu SMP Thu Apr 14 19:46:19 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 17 (8 by maintainers)
so, we just removed the
projectcalico.org/operator-node-migration: pre-operatorlabel out of curiosity from a single node and that lead to the operator actually starting the migration and moving calico-node pods from the kube-system daemonset to the operator managed calico-system daemonset