istio: istio pilot crash on load
(This is used to report product bugs, please visit https://discuss.istio.io for questions on using Istio)
Bug description
we are using istio latest 1.1.7
using helm default config.
we are doing load test before using istio in production env
in the test we target to create 4500
pods (during 30 minute), each pod contain 3 routes however
istio pilot was crashed after 2750
pods. (also istio-ingressgateway & istio-policy)
using prometheus we saw that during the creation of the pods the pilot use 53GB
of memory,…
- what could be the reason for the high memory (53GB) consumption ?
- should we update any value here to avoid crash of pilot on heavy load? https://istio.io/docs/reference/config/installation-options/#pilot-options
The error message which the istio pilot was down was
The node was law in resource, memory container discovery was using 31084218ki
, which exceedes its request for 2GI.
The pod error was crashloopback ,
Any suggestion/hint how we can overcome this issue ?
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure [ ] Docs [ ] Installation [x] Networking [x] Performance and Scalability [x] Policies and Telemetry [ ] Security [ ] Test and Release [x] User Experience
Expected behavior Istio will be functional on high load Steps to reproduce the bug
Version (include the output of istioctl version --remote
and kubectl version
)
1.1.7
How was Istio installed?
Helm
Environment where bug was observed (cloud vendor, OS, etc)
K8S 1.13.6
Additionally, please consider attaching a cluster state archive by attaching
the dump file to this issue.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 24 (15 by maintainers)
@hzxuzhonghu we have some regularly scheduled tests running at http://grafana.v12.qualistio.org/ (seems prometheus is having issues currently though, need to investigate). That one has 450 services and 1100 pods.
http://grafana.v11.qualistio.org also has 450 services and 1700 pods. Pilot is only using 2.5GB in this case, although it is using sidecar.
@RaynDol it would be useful if you could export the grafana Pilot dashboard during the time period. (top right of grafana ui -> share -> snapshot -> publish to raintank).