go: net: deadlock in TestDialParallelSpuriousConnection on darwin-arm*
--- FAIL: TestDialParallelSpuriousConnection (1.49s)
dial_test.go:456: got read tcp6 [::1]:59891->[::1]:59893: i/o timeout; want EOF
FAIL
FAIL net 4.495s
2020-03-10T00:24:30-08dee51/darwin-arm64-corellium 2020-03-02T15:39:23-12d02e7/darwin-arm64-corellium 2020-02-29T17:02:40-74f8983/darwin-arm64-corellium 2020-02-24T16:39:52-3093959/darwin-arm64-corellium 2020-01-31T20:18:54-1b7fefc/darwin-arm64-corellium 2020-01-23T21:01:12-ace25f8/darwin-arm-mg912baios 2020-01-19T14:04:09-8e0be05/darwin-arm-mg912baios 2019-10-30T00:41:31-47efbf0/darwin-arm-mg912baios
Forked from #34495.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (15 by maintainers)
Commits related to this issue
- net: avoid darwin/arm64 platform bug in TestCloseWrite On darwin_arm64, reading from a socket at the same time as the other end is closing it will occasionally hang for 60 seconds before returning EC... — committed to golang/go by bcmills 2 years ago
- net: avoid darwin/arm64 platform bug in TestCloseWrite On darwin_arm64, reading from a socket at the same time as the other end is closing it will occasionally hang for 60 seconds before returning EC... — committed to jproberts/go by bcmills 2 years ago
Cool, thanks. I sent an e-mail to a couple of contacts at Apple.
This appears to be a macOS bug. I’ve reproduced it in C.
Client:
Server:
Run server. Run client. After a small number of iterations (takes milliseconds on my laptop), the server will hang in the
read()call for 60 seconds before the call fails withECONNRESET.This looks like a real failure.
I’ve reproduced it in a simpler test case:
After running for a few minutes on my darwin/arm64 laptop:
The client dials and immediately closes a connection. The listener accepts and reads from the connection. On rare occasions, the read hangs for 60 seconds before returning “connection reset by peer”.
Running the test 100’000 times on a darwin/amd64 machine did not reproduce the problem:
Running the same on a darwin/arm64 machine seems to reproduce after a while:
Complete output
Tested with Go 1.18.2, both with macOS 12 (though slightly different minor versions).
greplogs --dashboard -md -l -e 'FAIL: TestDialParallelSpuriousConnection.*\n.*i/o timeout' --since=2020-10-202020-10-20T00:59:23-a505312/ios-arm64-corellium
Looks fixed to me: none of these in over a year.