go: x/tools/gopls: regtest flakes due to hanging go commands
#!watchflakes
post <- goos != "netbsd" && (pkg ~ `^golang.org/x/tools/gopls` || (repo == "tools" && pkg == "")) &&
`detected hanging go command`
greplogs --dashboard -md -l -e ‘runCmdContext’ --since=2022-07-14
2022-08-15T17:42:12-987de34-1f833e4/darwin-amd64-12_0 2022-08-14T00:06:23-35f806b-59865f1/netbsd-amd64-9_0 2022-08-12T20:40:05-bebd890-2f6783c/netbsd-386-9_0 2022-08-12T18:15:28-bebd890-b6f87b0/netbsd-amd64-9_0 2022-08-12T12:39:26-88d981e-f67c766/netbsd-amd64-9_0 2022-08-12T00:04:29-c4ec74a-a5cd894/netbsd-386-9_0 2022-08-11T19:05:54-c4ec74a-62654df/netbsd-amd64-9_0 2022-08-11T17:53:50-37a81b6-a526ec1/netbsd-amd64-9_0 2022-08-11T16:19:14-37a81b6-2340d37/netbsd-amd64-9_0 2022-08-11T12:53:59-b2156b5-3c200d6/netbsd-386-9_0 2022-08-10T22:22:48-b2156b5-6b80b62/netbsd-amd64-9_0 2022-08-10T17:41:25-0ad49fd-f19f6c7/netbsd-amd64-9_0 2022-08-10T15:08:24-3950865-c81dfdd/netbsd-amd64-9_0 2022-08-10T02:14:09-6fa767d-5531838/plan9-386-0intro 2022-08-09T14:33:24-92d58ea-0981d9f/openbsd-386-70 2022-08-09T14:12:01-92d58ea-662a729/netbsd-amd64-9_0 2022-08-09T13:39:27-92d58ea-9e8020b/netbsd-386-9_0 2022-08-09T11:28:56-92d58ea-0f8dffd/netbsd-amd64-9_0 2022-08-08T18:10:56-fff6d6d-4bcc138/netbsd-amd64-9_0 2022-08-08T15:33:45-06d96ee-0581d69/netbsd-amd64-9_0 2022-08-08T15:07:46-06d96ee-cd54ef1/netbsd-amd64-9_0 2022-08-08T14:12:21-06d96ee-e761556/netbsd-amd64-9_0 2022-08-08T06:16:59-06d96ee-0f6ee42/darwin-amd64-11_0 2022-08-08T06:16:59-06d96ee-0f6ee42/netbsd-386-9_0 2022-08-06T15:20:00-06d96ee-0c4db1e/plan9-386-0intro 2022-08-05T19:51:08-06d96ee-4fb7e22/plan9-386-0intro 2022-08-04T20:05:03-81c7dc4-39728f4/netbsd-386-9_0 2022-08-04T20:05:03-81c7dc4-39728f4/netbsd-amd64-9_0 2022-08-04T20:04:16-3519aa2-39728f4/netbsd-386-9_0 2022-08-04T19:57:25-763f65c-39728f4/netbsd-386-9_0 2022-08-04T18:51:46-99fd76f-39728f4/openbsd-386-70 2022-08-04T17:05:18-3e0a503-fb1bfd4/netbsd-amd64-9_0 2022-08-04T15:50:11-3e0a503-fcdd099/netbsd-386-9_0 2022-08-04T15:50:11-3e0a503-44ff9bf/netbsd-amd64-9_0 2022-08-04T15:31:49-87f47bb-44ff9bf/plan9-386-0intro 2022-08-04T14:58:59-87f47bb-4345620/netbsd-386-9_0 2022-08-04T10:32:51-3e0a503-a10afb1/linux-386-buster 2022-08-03T21:02:27-8b9a1fb-f28fa95/plan9-386-0intro 2022-08-03T21:02:27-8b9a1fb-4345620/netbsd-386-9_0 2022-08-03T18:07:40-d08f5dc-fcdd099/netbsd-386-9_0 2022-08-03T13:50:38-ddb90ec-c6a2dad/dragonfly-amd64-622 2022-08-03T13:50:38-ddb90ec-c6a2dad/plan9-386-0intro 2022-08-03T12:09:24-ddb90ec-29b9a32/plan9-386-0intro 2022-08-02T18:52:36-0d04f65-29b9a32/plan9-386-0intro 2022-08-02T18:19:01-d025cce-be59153/netbsd-amd64-9_0 2022-08-02T18:16:22-10cb435-d723df7/netbsd-amd64-9_0 2022-08-02T18:07:14-4d0b383-d723df7/netbsd-386-9_0 2022-08-02T18:07:14-4d0b383-d723df7/netbsd-amd64-9_0 2022-08-02T17:23:42-4d0b383-1b7e71e/darwin-amd64-nocgo 2022-08-02T16:05:48-4d0b383-f2a9f3e/netbsd-amd64-9_0 2022-07-29T20:19:23-9580c84-9240558/windows-arm64-11 2022-07-28T20:06:00-8ea5687-d9242f7/darwin-amd64-nocgo 2022-07-27T15:04:58-39a4e36-4248146/freebsd-386-13_0 2022-07-26T18:43:08-6c8a6c4-d9242f7/aix-ppc64 2022-07-25T20:44:49-2a6393f-24dc27a/darwin-amd64-10_14 2022-07-25T18:11:01-4375b29-795a88d/plan9-386-0intro 2022-07-25T14:16:17-178fdf9-64f2829/plan9-386-0intro 2022-07-22T20:12:19-7b605f4-c5da4fb/plan9-386-0intro 2022-07-21T20:11:06-ec1f924-c4a6d30/plan9-386-0intro 2022-07-15T15:11:26-22d1494-2aa473c/windows-386-2008-newcc 2022-07-15T14:27:36-1a4e02f-4651ebf/windows-arm64-10 2022-07-15T14:20:24-db8f89b-4651ebf/windows-arm64-10 2022-07-14T21:03:14-db8f89b-783ff7d/windows-arm64-11 2022-07-14T21:01:58-db8f89b-aa80228/darwin-arm64-11 2022-07-14T19:05:09-db8f89b-a906d3d/windows-arm64-10 2022-07-14T15:54:36-db8f89b-266c70c/windows-arm64-10 2022-07-14T01:47:39-db8f89b-558785a/windows-arm64-11
We recently started waiting for all go command invocations when shutting down gopls regtests. It appears that sometimes we kill the go command and still don’t get a result from cmd.Wait(). For example, here:
https://build.golang.org/log/00046e0b005c7660d676a3a415561950048f756a
In that failure, the test runner looks otherwise healthy (other tests ran fast), and yet the goroutine stack clearly shows a go command hanging for 9 minutes here: https://cs.opensource.google/go/x/tools/+/master:internal/gocommand/invoke.go;l=260;drc=f38573358cbedf46d64c5759ef41b72afcf0c5c0
@bcmills do you happen to have any idea of what might cause this?
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 126 (40 by maintainers)
Commits related to this issue
- internal/gocommand: add instrumentation for hanging go commands When a go command hangs during gopls regression tests, print out additional information about processes and file descriptors. For gola... — committed to golang/tools by findleyr 2 years ago
- internal/gocommand: tweak debugging for hanging go commands Add a TODO and wait for a shorter period of time following Kill, per post-submit advice from bcmills on CL 424075. For golang/go#54461 Ch... — committed to golang/tools by findleyr 2 years ago
- internal/gocommand: show pid of process We're almost certain that the go process shown by ps is not the one that we're waiting for in runCmdContext, but only by printing the pid can we be sure. Upda... — committed to golang/tools by adonovan 2 years ago
- internal/gocommand: add instrumentation for hanging go commands When a go command hangs during gopls regression tests, print out additional information about processes and file descriptors. For gola... — committed to rinchsan/gosimports by findleyr 2 years ago
- internal/gocommand: tweak debugging for hanging go commands Add a TODO and wait for a shorter period of time following Kill, per post-submit advice from bcmills on CL 424075. For golang/go#54461 Ch... — committed to rinchsan/gosimports by findleyr 2 years ago
- internal/gocommand: show pid of process We're almost certain that the go process shown by ps is not the one that we're waiting for in runCmdContext, but only by printing the pid can we be sure. Upda... — committed to rinchsan/gosimports by adonovan 2 years ago
- internal/gocommand: more tweaks to DebugHangingGoCommands As suggested by bcmills, this change increases the time between SIGINT and SIGKILL to 5s (was 1s), and also suppresses the process dump if SI... — committed to golang/tools by adonovan a year ago
- internal/gocommand: stop copying stdout as soon as ctx is done If ctx is done, that means (by definition) that the caller is no longer interested in the output of the command. Since 'go list' output ... — committed to golang/tools by bcmills a year ago
That one is
netbsd, which could plausibly be explained as #50138. (I’d rather we wait for one of these on a platform without a known issue forWaitbefore we do too much digging.)After reading the source a bit more: this is EINVAL, which appears to mean that the Process.wait() has exited and the handle released, so this is a race, although it is surprising that we hit it so reliably.
A simpler explanation is that gopls sent SIGINT, waited for 1s, then sent SIGKILL, and the child (go) exited during the 1s grace period, causing kill(2) to fail with ESRCH (-> ErrProcessDone = “process already finished”). Seems like that’s to be expected if a busy machine causes a delay in responding to SIGINT. Perhaps the debugging logic should treat ErrProcessDone as success, not a reason to debug?