strimzi-kafka-operator: [Bug] unexpected removing all kafka resources when upgrade using helm3
Describe the bug I’m using strimzi operator v0.19.0 and tried upgrade to 0.20.0. When I’ve ran helm upgrade procedure all my resources (users, topics, clusters) was removed. I try to reproduce problem with fresh installed cluster and situation was reproduced again.
To Reproduce Steps to reproduce the behavior:
1. helm install strimzi-kafka strimzi/strimzi-kafka-operator --namespace kafka --set watchNamespaces="{kafka,test-kafka}" --version=0.19.0
2. create cluster, users and topics from manifests (apiVersion: v1beta1)
3. helm upgrade strimzi-kafka strimzi/strimzi-kafka-operator --namespace kafka --set watchNamespaces="{kafka,test-kafka}"
kubectl get crd| grep kafka| wc -l
0
After the steps above my cluster and users/topics was removed. The operator pod try to start and crashed with the following error:
2020-10-26 14:35:47 WARN WatchConnectionManager:198 - Exec Failure: HTTP 404, Status: 404 - 404 page not found
java.net.ProtocolException: Expected HTTP 101 response but was '404 Not Found'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229) [com.squareup.okhttp3.okhttp-3.12.6.jar:?]
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196) [com.squareup.okhttp3.okhttp-3.12.6.jar:?]
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) [com.squareup.okhttp3.okhttp-3.12.6.jar:?]
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) [com.squareup.okhttp3.okhttp-3.12.6.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
2020-10-26 14:35:47 WARN WatchConnectionManager:198 - Exec Failure: HTTP 404, Status: 404 - 404 page not found
java.net.ProtocolException: Expected HTTP 101 response but was '404 Not Found'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229) [com.squareup.okhttp3.okhttp-3.12.6.jar:?]
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196) [com.squareup.okhttp3.okhttp-3.12.6.jar:?]
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) [com.squareup.okhttp3.okhttp-3.12.6.jar:?]
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) [com.squareup.okhttp3.okhttp-3.12.6.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Expected behavior The operator should be updated without removing resources.
Environment (please complete the following information):
- Strimzi version: 0.19.0
- Installation method: Helm chart
- Kubernetes cluster: v.1.18.8
- Infrastructure: Rancher2 on Amazon EC2 instances
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 4
- Comments: 33 (10 by maintainers)
Commits related to this issue
- Update Helm upgrade docs - closes #3877 Signed-off-by: Jakub Scholz <www@scholzj.com> — committed to scholzj/strimzi-kafka-operator by scholzj 3 years ago
- [DOC] Update Helm upgrade docs (#4929) * Update Helm upgrade docs - closes #3877 Signed-off-by: Jakub Scholz <www@scholzj.com> * Apply suggestions from code review Signed-off-by: Jakub Schol... — committed to strimzi/strimzi-kafka-operator by scholzj 3 years ago
A quick workaround we found with our team:
There is also the option to edit the data in the helm secret instead of deleting it
The remove the CRD data inside the
templatesandmanifestsections and upload the secret againThen the upgrade to 0.20.0 will leave the CRDs alone…
Yes, that’s the problem of removing the CRD from the yaml manifest. Helm not longer controls what to do with them.
I’ve just hit the same issue on k3s single-node deployment. The CRDs seem to be managed by the helm chart so I guess there’s something off there. Below the upgrade logs from the helm-operator:
Afterward, there’s no CRDs to be found but it’s weird that helm doesn’t throw any errors and straight off starts removing all kafka components. So I think this is more for the helm chart maintainers than anything else and these upgrades have to be thoroughly tested as no one wants to inadvertently kill their entire kafka clusters when upgrading the operator.
I just encountered the same situation. I’m running AWS EKS with Kubernetes 1.18 and when running update using helm, all the Strimzi CRDs were removed and not installed back.