istio: istio pilot crash on load

(This is used to report product bugs, please visit https://discuss.istio.io for questions on using Istio)

Bug description we are using istio latest 1.1.7 using helm default config. we are doing load test before using istio in production env

in the test we target to create 4500 pods (during 30 minute), each pod contain 3 routes however istio pilot was crashed after 2750 pods. (also istio-ingressgateway & istio-policy) using prometheus we saw that during the creation of the pods the pilot use 53GB of memory,…

  1. what could be the reason for the high memory (53GB) consumption ?
  2. should we update any value here to avoid crash of pilot on heavy load? https://istio.io/docs/reference/config/installation-options/#pilot-options

The error message which the istio pilot was down was The node was law in resource, memory container discovery was using 31084218ki, which exceedes its request for 2GI.

The pod error was crashloopback ,

Any suggestion/hint how we can overcome this issue ?

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure [ ] Docs [ ] Installation [x] Networking [x] Performance and Scalability [x] Policies and Telemetry [ ] Security [ ] Test and Release [x] User Experience

Expected behavior Istio will be functional on high load Steps to reproduce the bug

Version (include the output of istioctl version --remote and kubectl version) 1.1.7 How was Istio installed? Helm Environment where bug was observed (cloud vendor, OS, etc) K8S 1.13.6 Additionally, please consider attaching a cluster state archive by attaching the dump file to this issue.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 24 (15 by maintainers)

Most upvoted comments

@hzxuzhonghu we have some regularly scheduled tests running at http://grafana.v12.qualistio.org/ (seems prometheus is having issues currently though, need to investigate). That one has 450 services and 1100 pods.

http://grafana.v11.qualistio.org also has 450 services and 1700 pods. Pilot is only using 2.5GB in this case, although it is using sidecar.

@RaynDol it would be useful if you could export the grafana Pilot dashboard during the time period. (top right of grafana ui -> share -> snapshot -> publish to raintank).