istio: Upgrading control plane from 1.2.2 to 1.2.5 causing down time
Bug description
When I try to upgrade-downgrade between versions 1.2.2 and 1.2.5 my applications which are using sidecar goes into unready state and I see a downtime in my services.
My requests follow this path:
Load generator (outside cluster) -> Load Balancer (outside cluster) -> Istio Ingressgateway (inside cluster) -> Application (just simple nginx docker image)
I have about 20 instances of istio-ingressgateway and 60 instances of nginx and I generate a load of about 15k rps which normally this setup handles without a sweat.
What I observe when I do netstat -ltpn
inside sidecar proxy is that a new envoy process comes up and old one goes away, this probably causes the application to become unhealthy because this new envoy process isn’t listening on port 15090. After a while it does start listening to 15090 and 15001 and the errors go away once all instances are back.
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure [ ] Docs [X] Installation [X] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [ ] Security [X] Test and Release [ ] User Experience [ ] Developer Infrastructure
Expected behavior To not see any affect on my traffic when doing control plane upgrade of Istio
Steps to reproduce the bug
We consider istio-ingressgateway to also be a part of data plane and don’t want to make any changes to it. We upgrade everything else other than this. CNI is running on version 1.2.5
I try to change versions using these commands -
helm template install/kubernetes/helm/istio-init --name istio-init --namespace istio-system | kubectl apply -f -
mkdir tmp
mv install/kubernetes/helm/istio/charts/gateways/templates/* tmp/
helm template install/kubernetes/helm/istio/ --namespace istio-system --name istio --values custom.yaml | kubectl -n istio-system apply -f -
mv tmp/* install/kubernetes/helm/istio/charts/gateways/templates/
rm -r tmp/
This will temporarily remove all gateway related changes and upgrade everything else.
Version (include the output of istioctl version --remote
and kubectl version
)
Istio - 1.2.2 to 1.2.5
Kubernetes - 1.15.0
How was Istio installed? Using helm template and this custom.yaml for values -
gateways:
istio-ingressgateway:
type: NodePort
autoscaleMin: 20
autoscaleMax: 20
ports:
- port: 80
targetPort: 80
name: http2
nodePort: 60000
- port: 443
name: https
nodePort: 60001
- port: 31400
name: tcp
nodePort: 61400
resources:
requests:
cpu: 2
memory: 512Mi
limits:
cpu: 2
memory: 512Mi
kiali:
enabled: true
dashboard:
grafanaURL: http://grafana:3000
jaegerURL: http://tracing:80
resources:
requests:
cpu: 4
memory: 4096Mi
limits:
cpu: 4
memory: 4096Mi
createDemoSecret: true
prometheusAddr: prometheus.internal.com
mixer:
policy:
enabled: false
telemetry:
autoscaleMin: 30
autoscaleMax: 100
grafana:
enabled: true
pilot:
traceSampling: 100.0
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
tracing:
enabled: false
istio_cni:
enabled: true
global:
policyCheckFailOpen: true
proxy:
logLevel: "error"
resources:
requests:
cpu: 50m
memory: 180Mi
limits:
cpu: 2
memory: 512Mi
defaultResources:
requests:
cpu: 1
memory: 2048Mi
limits:
cpu: 2
memory: 2048Mi
Environment where bug was observed (cloud vendor, OS, etc) On prem k8s cluster running on bare metal
Additionally, please consider attaching a cluster state archive by attaching the dump file to this issue.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 19 (15 by maintainers)
Commits related to this issue
- Remove cleanup-secrets job This was intended to delete Istio secrets after you did `helm remove`. Instead, it deletes secrets during every upgrade, causing outages. Fixes https://github.com/istio/is... — committed to howardjohn/istio by howardjohn 5 years ago
- Remove cleanup-secrets job (#17122) This was intended to delete Istio secrets after you did `helm remove`. Instead, it deletes secrets during every upgrade, causing outages. Fixes https://github.com... — committed to istio/istio by howardjohn 5 years ago
- Remove cleanup-secrets job This was intended to delete Istio secrets after you did `helm remove`. Instead, it deletes secrets during every upgrade, causing outages. Fixes https://github.com/istio/is... — committed to istio-testing/istio by howardjohn 5 years ago
- Remove cleanup-secrets job (#17150) This was intended to delete Istio secrets after you did `helm remove`. Instead, it deletes secrets during every upgrade, causing outages. Fixes https://github.com... — committed to istio/istio by istio-testing 5 years ago
- Remove cleanup-secrets job This was intended to delete Istio secrets after you did `helm remove`. Instead, it deletes secrets during every upgrade, causing outages. Fixes https://github.com/istio/is... — committed to istio-testing/istio by howardjohn 5 years ago
- Remove cleanup-secrets job (#17189) This was intended to delete Istio secrets after you did `helm remove`. Instead, it deletes secrets during every upgrade, causing outages. Fixes https://github.... — committed to istio/istio by istio-testing 5 years ago
Confirmed this fixed the ACK ERRORS about certs not found on upgrades as well