bazel: Bazel CI: RBE builds are broken after grpc java upgrade

https://buildkite.com/bazel/bazel-auto-sheriff-face-with-cowboy-hat/builds/306

ERROR: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/cfad747ece6c2992c5b867a14a43555e/external/org_golang_x_crypto/curve25519/BUILD.bazel:3:11: GoCompilePkg external/org_golang_x_crypto/curve25519/curve25519.a failed (Exit 34): com.google.devtools.build.lib.remote.BulkTransferException
	at com.google.devtools.build.lib.remote.RemoteCache.waitForBulkTransfer(RemoteCache.java:225)
	at com.google.devtools.build.lib.remote.RemoteCache.download(RemoteCache.java:331)
	at com.google.devtools.build.lib.remote.RemoteSpawnRunner.downloadAndFinalizeSpawnResult(RemoteSpawnRunner.java:486)
	at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:306)
	at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:240)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:134)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:102)
	at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
	at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:65)
	at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:331)
	at com.google.devtools.build.lib.actions.Action.execute(Action.java:127)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$4.execute(SkyframeActionExecutor.java:859)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1019)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:978)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:129)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:81)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:469)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:845)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:314)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:438)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:398)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
	Suppressed: java.io.IOException: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted
HTTP/2 error code: ENHANCE_YOUR_CALM
Received Goaway
too_many_pings

Verified by building with Bazel@d4cd4e7ab18ebeae4152dafc113367289ffebb12 and its previous commit: https://buildkite.com/bazel/culprit-finder/builds/581 https://buildkite.com/bazel/culprit-finder/builds/582

Culprit: d4cd4e7ab18ebeae4152dafc113367289ffebb12

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 23 (23 by maintainers)

Commits related to this issue

Most upvoted comments

Great, then I can make the PRs: add 1.32.x, switch to 1.32.x & bring auto flow control back, drop 1.31.1

yes, auto flow enables pinging https://github.com/grpc/grpc-java/blob/v1.26.0/netty/src/main/java/io/grpc/netty/AbstractNettyHandler.java#L141 - this is where auto flow pinging gets enabled in v1.26.0 (same in v1.31.1, but v1.31.1 enables auto flow by default for both client&server)

Given that auto flow control is a new feature and there’s some indication that it caused the regression I’d rather try disabling it first https://github.com/bazelbuild/bazel/pull/12266 as a more solid option.

v1.32.2 has fixes in that area, but it takes more PRs to bump again, unless there’s an easy way to check whether it really helps before merging probably a good idea to try a faster fix.

I will prepare v1.32.2 though