istio: Istio-pilot crash due to high memory leak

Bug description

We installed istio via helm default config (prod) and try to simulate a high load environment according to our production KPI’s.

Each pod uses 8 route, each pod created in new ns.

Using the default configs pilot was crashed after the creation of approximately 800 pods. To overcome this we used helm config to extend the default memory to 4096MB and autoscaleMax to 15, with this config we were able to create 1250 pods until we got the following error from istio-pilot. (OOM)

Error: The node was low on resource: memory. container discovery was using 12687204ki, which exceeds its request of 4Gi ( before we change the config we got the same error with lower numbers of memory etc)

Clearly, we tackled a memory leak as pilot reach the memory limit of 60GB after a 10-15 min as we are creating new pod per one second. we also tried to reduce the load and create a like following: 1 every 12 seconds -> 5 in a minute > 300 in a hour -> 3000 in 10 hours. Again, after ~1,300 pods creation, all istio-pilot active connections crushed.

Following is some snapshots: ( IMO the graph is not accurate as pilot memory show more than 100GB of memory usage…)

pilot https://snapshot.raintank.io/dashboard/snapshot/BHHzAQQU6eKCSwenvAKWry4hh2QSbS0M?orgId=2

https://snapshot.raintank.io/dashboard/snapshot/sZqEgXIRCJ46lfvkreVp68RbZS0ZL50v?orgId=2&from=1560152601034&to=1560153201034

performance https://snapshot.raintank.io/dashboard/snapshot/OQau3FszAH1tcR6MBOu0n7PM33e0uiLE?orgId=2

Affected product area (please put an X in all that apply) [ ] Configuration Infrastructure [ ] Docs [x] Installation [x] Networking [x] Performance and Scalability [ ] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience

Expected behavior istio pods should be functional Steps to reproduce the bug

Version (include the output of istioctl version --remote and kubectl version) we are using K8S version 1.13.6 and istio 1.1.7 How was Istio installed? First run:

helm template install/kubernetes/helm/istio --name istio --namespace istio-system | kubectl apply -f -

Second run: scaling istio-pilot & ingress-gateway

helm template install/kubernetes/helm/istio --name istio --namespace istio-system \ --set pilot.autoscaleMax=15 --set pilot.resources.requests.memory=4096Mi \ --set gateways.istio-ingressgateway.autoscaleMax=15 --set global.proxy.resources.limits.memory=256Mi \ --set tracing.enabled=true --set servicegraph.enabled=true \ --set grafana.enabled=true | kubectl apply -f -

Environment where the bug was observed (cloud vendor, OS, etc) K8S

in addition, We didn’t use the label-inject (side-car) option for each pod yet.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 1
Comments: 22 (18 by maintainers)

Most upvoted comments

We verified it with the same test as before. With v1.3-rc0, the error occured. With v1.3-rc1, the memory consumption looks good (as expected from @howardjohn)

c0d1ngm0nk3y on Sep 3, 2019