go: runtime: exit status 0xC0000005 from test process on windows-amd64-longtest builder

Observed in 2020-04-10T16:24:46-ea7126f/windows-amd64-longtest:

windows-amd64-longtest …
…
--- FAIL: TestTestCache (4.15s)
    go_test.go:2405: 
        
        INITIAL
        
        
    go_test.go:2413: running testgo [test -x -v -short t/...]
    go_test.go:2413: standard output:
    go_test.go:2413: exit status 3221225477
        FAIL	t/t1	0.486s
        exit status 3221225477
        FAIL	t/t2	0.481s
        === RUN   Test3
            Test3: t3_test.go:6: 1
        --- PASS: Test3 (0.00s)
        PASS
        ok  	t/t3	0.462s
        === RUN   Test4
            Test4: t4_test.go:6: 1
        --- PASS: Test4 (0.00s)
        PASS
        ok  	t/t4	0.467s
        FAIL
        
    go_test.go:2413: standard error:

The FAIL in t/t1 is unexpected in this test.

Exit status 3221225477 is 0xC0000005, which some cursory searching seems to suggest is a generic “access violation” error. That suggests possible memory corruption.

I haven’t seen any repeats of this error so far.

CC @alexbrainman @zx2c4

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 84 (76 by maintainers)

Most upvoted comments

Interestingly, according to the Mm team, mappings and ReadFile should be coherent as long as the file was opened cached, i.e. without FILE_NO_INTERMEDIATE_BUFFERING. The fact that they appear not to be in this case may be an OS bug.

We (well, not me, but others here in Windows) will try to get a local repro under stress. In the meantime, it sounds like flushing the mapping explicitly is a reasonable workaround.

Thanks for looping me in. @aclements, on which OS build are you reproducing this?

jstarks on Jun 2, 2020

I’m working on bisecting this (which is a very slow process; given the 13% failure probability, it takes 34 successful runs to drive the chance of a missed repro under 1%). I have been able to reproduce it as far back as da8591b61c141ca58e4b3eae382d2827006344fd, which is 32 commits earlier than the first observation on the dashboard (ea7126fe141879e1065d196570c078fbec09f3b6).

aclements on May 27, 2020

For lack of a better idea, I’m now running git bisect start da8591b61c141ca58e4b3eae382d2827006344fd go1.14.3. My reproduction steps are

VM=$(gomote create windows-amd64-longtest)
gomote push $VM
timeout 45m gomote run -path '$PATH,$WORKDIR/go/bin' -e GO_TEST_SHORT=0 -e GO_TEST_TIMEOUT_SCALE=5 $VM go/src/all.bat

I’ve wrapped this in a horrible shell script that considers “exit status 3221225477” to be a failure, “ALL TESTS PASSED” to be a success, and anything else to be a flake. And I’m running it across five gomotes so it doesn’t take an eternity and a half.

aclements on May 27, 2020

I should note that go1.14.3 included CL 213837, which was your hypothesized culprit, but I wasn’t able to reproduce at that tag (I even accidentally ran a lot more iterations than I meant to!)

aclements on May 27, 2020

Or a runtime regression. The other repeating mystery failure is on windows/386, which has windows in common. I took a quick peek through git history around 1/9/2020 and CL 213837 might be a candidate(?).

josharian on Apr 15, 2020