kubernetes: verify-openapi-spec.sh frequently hanging in CI

Many PRs have been failing due to the pull-kubernetes-verify job timing out.

It looks like both PR and CI jobs are affected: http://k8s-testgrid.appspot.com/presubmits-kubernetes-blocking#pull-kubernetes-verify&width=5&graph-metrics=test-duration-minutes http://k8s-testgrid.appspot.com/sig-release-master-blocking#verify&width=20&graph-metrics=test-duration-minutes

The failures aren’t consistent, but there appears to be a clear uptick starting around 13:00 PDT on Monday, April 30.

There aren’t any notable test-infra changes from that time.

For suspicious kubernetes changes, nothing immediately leaps out. The only two PRs I find suspicious:

I’ve been set -x debugging the verify scripts, and the last lines printed before hanging are

W0502 20:36:35.746] + echo '/go/src/k8s.io/kubernetes/api/openapi-spec up to date.'
W0502 20:36:35.746] + cp -a /go/src/k8s.io/kubernetes/_tmp/openapi-spec /go/src/k8s.io/kubernetes/api/openapi-spec/..
W0502 20:36:35.746] + rm -rf /go/src/k8s.io/kubernetes/_tmp
W0502 20:36:35.746] + echo 0
W0502 20:36:35.746] + tr -d '\n'

which corresponds to https://github.com/kubernetes/kubernetes/blob/0d43bdec2b8598ff542a1afdee876d417b4e7668/third_party/forked/shell2junit/sh2ju.sh#L48-L49 and https://github.com/kubernetes/kubernetes/blob/0d43bdec2b8598ff542a1afdee876d417b4e7668/third_party/forked/shell2junit/sh2ju.sh#L110-L112

so there might be some weird pipe/buffering nonsense going on causing everything to get stuck. (I’m still not sure what would cause this to start failing, though.)

cc @cblecker @BenTheElder

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16 (16 by maintainers)

Commits related to this issue

Most upvoted comments

we can do both, but if circumstances beyond our control mean etcd is dead, the apiserver should still be well-behaved

doesn’t wait for the apiserver to actually exit,

Would a better fix be to update the script to wait for the apiserver to exit?

If I take d39eac929f3babfc19e372a89af71d8fa3cbdcf1 but pass --endpoint-reconciler-type=master-count to the apiserver in hack/update-openapi-spec.sh, the process is killed after the script runs.