go: crypto/x509: intermittent failures on darwin/amd64 with ‘x509: “…” certificate is expired’ since 2021-11-06
2022/02/14 12:08:07 failed to run `go mod download -json github.com/google/wycheproof@v0.0.0-20191219022705-2196000605e4`, output: {
"Path": "github.com/google/wycheproof",
"Version": "v0.0.0-20191219022705-2196000605e4",
"Error": "github.com/google/wycheproof@v0.0.0-20191219022705-2196000605e4: Get \"https://proxy.golang.org/github.com/google/wycheproof/@v/v0.0.0-20191219022705-2196000605e4.info\": x509: “misc-sni.google.com” certificate is expired"
}
FAIL golang.org/x/crypto/internal/wycheproof 0.555s
greplogs --dashboard -md -l -e 'x509: “misc-sni.google.com” certificate is expired'
2022-02-14T20:07:02-8634188-b2cb1bd/darwin-amd64-11_0 2022-02-11T19:36:36-f4118a5-0bde2cf/darwin-amd64-11_0 2022-01-10T21:48:09-03fcf44-1f411e9/darwin-amd64-11_0 2021-12-09T19:01:08-03fcf44-8ff254e/darwin-amd64-11_0 2021-12-08T18:06:06-5770296-7b7efd7/darwin-amd64-11_0 2021-12-08T15:15:40-03fcf44-08025a9/darwin-amd64-11_0 2021-11-12T18:57:22-c33205f-b1b6d92/darwin-amd64-10_15 2021-11-06T19:41:15-ec00351-61d789d/darwin-amd64-11_0
These failures seem to always occur when fetching from proxy.golang.org
, but not for any specific package. (They occur intermittently across many different repos and tests.) However, I have only observed them on darwin/amd64
builders. I’m not sure where in the stack these failures are introduced — whether they’re due to a bug in the darwin kernel, a bug (perhaps a race?) in crypto/x509
, some misconfiguration on the builders, or something else entirely.
These failures also seem to have started recently (during or shortly before the Go 1.18 code freeze).
@golang/security: since darwin/amd64
is a first class port, could someone assess whether this is a Go 1.18 regression, and whether and how we should mitigate these errors?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 22 (21 by maintainers)
It works so well because of the register ABI or the generated code, which caused t.Sub(…).Seconds() inside x509 to leave the number we wanted in the floating-point register. As long as it doesn’t get overwritten due to goroutine rescheduling before the libc call, all is well.
I have confirmed this by putting a spurious math.Sin call in the middle.
This program calls x509.Verify directly, from a bunch of goroutines, over and over, on the cert chain from proxy.golang.org. It fails pretty often:
The call is
If I remove the CurrentTime line, then the problem goes away. This strongly suggests that the time conversion is the problem. And looking more carefully at crypto/tls, it does set CurrentTime unconditionally in handshake_client.go.
The time conversion is done in crypto/x509/internal/macos:
The only problem is that CFDateCreate takes a CFAbsoluteTime and a CFAbsoluteTime is seconds since 2001-01-01, yes, but as a CFTimeInterval, which in turn is a double (float64).
We are passing an integer, in an integer register, and leaving the floating-point register where CFDateCreate expects to find its argument completely uninitialized.
Honestly it is amazing that it works as often as it does (most of the time!). I wonder if SecTrustSetVerifyDate treats super weird dates as if the call never happened and falls back to the current date. Or maybe X0 is very often zero, and zero is special, or maybe something else…
In any event, this hack makes the program run without failure for many minutes, when before it failed every few seconds:
That seems to confirm the diagnosis. I will think more about the right fix.
Reopening to track cherry-pick to Go 1.18 release branch.
(I’m on vacation, but this is amazing.)
2022/02/14 12:08:07
in 2022-02-14T20:07:02-8634188-b2cb1bd/darwin-amd64-11_0