bazel: Bazel CI: RBE builds are broken after grpc java upgrade
https://buildkite.com/bazel/bazel-auto-sheriff-face-with-cowboy-hat/builds/306
ERROR: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/cfad747ece6c2992c5b867a14a43555e/external/org_golang_x_crypto/curve25519/BUILD.bazel:3:11: GoCompilePkg external/org_golang_x_crypto/curve25519/curve25519.a failed (Exit 34): com.google.devtools.build.lib.remote.BulkTransferException
at com.google.devtools.build.lib.remote.RemoteCache.waitForBulkTransfer(RemoteCache.java:225)
at com.google.devtools.build.lib.remote.RemoteCache.download(RemoteCache.java:331)
at com.google.devtools.build.lib.remote.RemoteSpawnRunner.downloadAndFinalizeSpawnResult(RemoteSpawnRunner.java:486)
at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:306)
at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:240)
at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:134)
at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:102)
at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:65)
at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:331)
at com.google.devtools.build.lib.actions.Action.execute(Action.java:127)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$4.execute(SkyframeActionExecutor.java:859)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1019)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:978)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:129)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:81)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:469)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:845)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:314)
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:438)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:398)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Suppressed: java.io.IOException: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted
HTTP/2 error code: ENHANCE_YOUR_CALM
Received Goaway
too_many_pings
Verified by building with Bazel@d4cd4e7ab18ebeae4152dafc113367289ffebb12 and its previous commit: https://buildkite.com/bazel/culprit-finder/builds/581 https://buildkite.com/bazel/culprit-finder/builds/582
Culprit: d4cd4e7ab18ebeae4152dafc113367289ffebb12
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 23 (23 by maintainers)
Commits related to this issue
- Try disabling grpc server auto flow control Was turned on by default during 1.26.0->1.31.1 grpc-java bump It seems that it may be causing errors in RBE: io.grpc.StatusRuntimeException: RESOURCE_EXHA... — committed to dmivankov/bazel by dmivankov 4 years ago
- Bump grpc to 1.32.x to fix a too_many_pings regression grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOU... — committed to dmivankov/bazel by dmivankov 4 years ago
- [1/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unver... — committed to dmivankov/bazel by dmivankov 4 years ago
- [1/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unver... — committed to dmivankov/bazel by dmivankov 4 years ago
- [1/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unver... — committed to dmivankov/bazel by dmivankov 4 years ago
- [2/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.g... — committed to dmivankov/bazel by dmivankov 4 years ago
- [3/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing... — committed to dmivankov/bazel by dmivankov 4 years ago
- Try disabling grpc server auto flow control Was turned on by default during 1.26.0->1.31.1 grpc-java bump It seems that it may be causing errors in RBE: io.grpc.StatusRuntimeException: RESOURCE_EXHA... — committed to bazelbuild/bazel by dmivankov 4 years ago
- [1/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unver... — committed to dmivankov/bazel by dmivankov 4 years ago
- [2/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.g... — committed to dmivankov/bazel by dmivankov 4 years ago
- [3/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing... — committed to dmivankov/bazel by dmivankov 4 years ago
- [1/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unver... — committed to dmivankov/bazel by dmivankov 4 years ago
- [2/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.g... — committed to dmivankov/bazel by dmivankov 4 years ago
- [3/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing... — committed to dmivankov/bazel by dmivankov 4 years ago
- [2/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.g... — committed to dmivankov/bazel by dmivankov 4 years ago
- [3/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing... — committed to dmivankov/bazel by dmivankov 4 years ago
- [1/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unver... — committed to dmivankov/bazel by dmivankov 4 years ago
- [1/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unver... — committed to bazelbuild/bazel by dmivankov 4 years ago
- [2/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.g... — committed to dmivankov/bazel by dmivankov 4 years ago
- [2/3] Bump grpc to 1.32.x to fix a too_many_pings regression Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.g... — committed to bazelbuild/bazel by dmivankov 4 years ago
Great, then I can make the PRs: add 1.32.x, switch to 1.32.x & bring auto flow control back, drop 1.31.1
yes, auto flow enables pinging https://github.com/grpc/grpc-java/blob/v1.26.0/netty/src/main/java/io/grpc/netty/AbstractNettyHandler.java#L141 - this is where auto flow pinging gets enabled in v1.26.0 (same in v1.31.1, but v1.31.1 enables auto flow by default for both client&server)
Given that auto flow control is a new feature and there’s some indication that it caused the regression I’d rather try disabling it first https://github.com/bazelbuild/bazel/pull/12266 as a more solid option.
v1.32.2 has fixes in that area, but it takes more PRs to bump again, unless there’s an easy way to check whether it really helps before merging probably a good idea to try a faster fix.
I will prepare v1.32.2 though