go: runtime: exit status 0xC0000005 from test process on windows-amd64-longtest builder

Observed in 2020-04-10T16:24:46-ea7126f/windows-amd64-longtest:

windows-amd64-longtest …
…
--- FAIL: TestTestCache (4.15s)
    go_test.go:2405: 
        
        INITIAL
        
        
    go_test.go:2413: running testgo [test -x -v -short t/...]
    go_test.go:2413: standard output:
    go_test.go:2413: exit status 3221225477
        FAIL	t/t1	0.486s
        exit status 3221225477
        FAIL	t/t2	0.481s
        === RUN   Test3
            Test3: t3_test.go:6: 1
        --- PASS: Test3 (0.00s)
        PASS
        ok  	t/t3	0.462s
        === RUN   Test4
            Test4: t4_test.go:6: 1
        --- PASS: Test4 (0.00s)
        PASS
        ok  	t/t4	0.467s
        FAIL
        
    go_test.go:2413: standard error:

The FAIL in t/t1 is unexpected in this test.

Exit status 3221225477 is 0xC0000005, which some cursory searching seems to suggest is a generic “access violation” error. That suggests possible memory corruption.

I haven’t seen any repeats of this error so far.

CC @alexbrainman @zx2c4

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 84 (76 by maintainers)

Most upvoted comments

Interestingly, according to the Mm team, mappings and ReadFile should be coherent as long as the file was opened cached, i.e. without FILE_NO_INTERMEDIATE_BUFFERING. The fact that they appear not to be in this case may be an OS bug.

We (well, not me, but others here in Windows) will try to get a local repro under stress. In the meantime, it sounds like flushing the mapping explicitly is a reasonable workaround.

Thanks for looping me in. @aclements, on which OS build are you reproducing this?

I’m working on bisecting this (which is a very slow process; given the 13% failure probability, it takes 34 successful runs to drive the chance of a missed repro under 1%). I have been able to reproduce it as far back as da8591b61c141ca58e4b3eae382d2827006344fd, which is 32 commits earlier than the first observation on the dashboard (ea7126fe141879e1065d196570c078fbec09f3b6).

For lack of a better idea, I’m now running git bisect start da8591b61c141ca58e4b3eae382d2827006344fd go1.14.3. My reproduction steps are

VM=$(gomote create windows-amd64-longtest)
gomote push $VM
timeout 45m gomote run -path '$PATH,$WORKDIR/go/bin' -e GO_TEST_SHORT=0 -e GO_TEST_TIMEOUT_SCALE=5 $VM go/src/all.bat

I’ve wrapped this in a horrible shell script that considers “exit status 3221225477” to be a failure, “ALL TESTS PASSED” to be a success, and anything else to be a flake. And I’m running it across five gomotes so it doesn’t take an eternity and a half.

I should note that go1.14.3 included CL 213837, which was your hypothesized culprit, but I wasn’t able to reproduce at that tag (I even accidentally ran a lot more iterations than I meant to!)

Or a runtime regression. The other repeating mystery failure is on windows/386, which has windows in common. I took a quick peek through git history around 1/9/2020 and CL 213837 might be a candidate(?).