go: x/build: frequent "communication error to buildlet" failures on `plan9-arm`
#!watchflakes
post <- builder == "plan9-arm" && `communication error to buildlet`
plan9-arm at 349cc83389f71c459b7820b0deecdf81221ba46c
…
communication error to buildlet (promoted to terminal error): network error promoted to terminal error: runTests: dist test failed: all buildlets had network errors or timeouts, yet tests remain
greplogs --dashboard -md -l -e '\Aplan9-arm.*(\n.*)*communication error to buildlet' --since=2022-01-01
2022-05-02T14:54:05-349cc83/plan9-arm
2022-04-27T14:23:28-f0c0e0f/plan9-arm
2022-04-26T02:28:58-17d7983/plan9-arm
2022-04-11T16:31:53-0179331/plan9-arm
2022-04-07T23:06:24-c451a02/plan9-arm
2022-04-05T14:15:59-62bceae/plan9-arm
2022-03-31T05:34:15-2b8178c/plan9-arm
2022-03-31T00:27:01-0a6ddcc/plan9-arm
2022-03-31T00:26:58-0775730/plan9-arm
2022-03-30T01:12:57-8fefeab/plan9-arm
2022-03-21T19:10:16-efbff6e/plan9-arm
2022-03-07T18:17:40-dcb6547/plan9-arm
2022-03-03T21:19:37-87a345c/plan9-arm
2022-03-01T19:32:51-44e92e1/plan9-arm
2022-02-25T00:25:34-b8b3196/plan9-arm
2022-02-01T18:15:07-125c5a3/plan9-arm
2022-01-27T21:25:18-ad345c2/plan9-arm
2022-01-19T16:33:11-985d97e/plan9-arm
2022-01-10T22:49:07-4ceb5a9/plan9-arm
@millerresearch, can something be done to prevent this builder from getting wedged?
(Compare #49756.)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (13 by maintainers)
There was another failure mode: one of the raspberry pi builders had only 1GB of RAM and no swap configured. I’ve added some swap space so it should be more stable now.
Whenever I do a manual retry using the
retrybuilds
command after a communication error failure, the next attempt always succeeds. My strong hunch is that it’s something in the underlying platform that’s stalling non-deterministically, not within the go code.I will set up a process on my local builders to monitor progress on the log output file. If nothing is emitted for say 15 minutes, it will send an alert so I can go in with the debugger and try to find out what’s stalled.