kubernetes: Failing/Flaking Test: E2E: [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability
Test: https://k8s-testgrid.appspot.com/sig-release-master-blocking#gci-gke-serial&show-stale-tests=
The HPA tests for gce-serial are flaking rather regularly; they fail about 2/3 of the time for the last week, and are often the cause of a failed test run. Can we de-flake these?
Over multiple test runs the problem seems to be the number of replicas jumping up to 4 unexpectedly:
Oct 3 11:39:13.153: INFO: ConsumeCPU URL: {https 35.232.126.216 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-qjvjw/services/rc-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=100 }
Oct 3 11:39:22.808: INFO: expecting there to be 3 replicas (are: 3)
Oct 3 11:39:32.778: INFO: expecting there to be 3 replicas (are: 4)
Oct 3 11:39:32.778: INFO: Unexpected error occurred: number of replicas changed unexpectedly
Is the limit not getting set correctly here? Is this an actual bug?
/sig autoscaling /priority important-soon /kind failing-test /kind flake
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 39 (31 by maintainers)
The PR merged. Now let’s wait and see if it solves the problem.
On Friday I verified that CPU usage generated by resource consumer stays really close to the target value but oscillates slightly. To fix this I:
I’m checking if this helps the test.