go: cmd/go: do not cache tool output if tools print to stdout/stderr

Update, Oct 7 2020: see https://github.com/golang/go/issues/27628#issuecomment-702252564 for most recent proposal in this issue.

What version of Go are you using (`go version`)?

tip (2e5c32518ce6facc507862f4156d4e6ac776754f), also Go 1.11

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

darwin/amd64

What did you do?

$ go build -toolexec=/usr/bin/time hello.go
# command-line-arguments
        0.01 real         0.00 user         0.00 sys
# command-line-arguments
        0.12 real         0.11 user         0.02 sys
$ go build hello.go 
# command-line-arguments
        0.01 real         0.00 user         0.00 sys
# command-line-arguments
        0.12 real         0.11 user         0.02 sys
$

What did you expect to see?

The second invocation of go build doesn’t have -toolexec, so it should not invoke the toolexec command (which I think it doesn’t), nor reprint its output.

What did you see instead?

toolexec output is reprinted.

In fact, I think it probably should not cache at all if -toolexec is specified, since the external command that toolexec invokes may do anything, and (intentionally) not reproducible.

cc @dr2chase

About this issue

Original URL
State: open
Created 6 years ago
Reactions: 5
Comments: 33 (31 by maintainers)

Most upvoted comments

Thinking about my comment above a bit more, I think this issue should still be fixed by simply doing no caching when -toolexec is used. That would be strictly better than what we have right now.

Once this issue is fixed, I can then file the “opt-in caching for -toolexec” idea as a separate proposal.

mvdan on Dec 9, 2019

I actually don’t think those are the right choices. There is a plausible use of -toolexec to invoke a debugger or other interactive program, in which case we should never cache if -toolexec is used. And there is a plausible use of -toolexec to invoke a logger or other build annotator, in which we should completely ignore -toolexec and its argument for caching purposes. I can’t think of an important use case in which we want to cache based on whether we are using the same -toolexec option as before.

I would probably pick “never cache if -toolexec is used” but I don’t feel very strongly about it.

ianlancetaylor on Sep 12, 2018

Counter-proposal: any time a command prints to stdout or stderr, its results are not cached.

The most common time a tool prints to stdout/stderr are failures, and those are already not cached because of the non-zero exit status. The new rule would generalize that: any print cannot be cached.

At first this seems a bit surprising and more tied to tool behavior than you’d expect, but it turns out that it would remove a bunch of special case code that been introduced since the build cache was first added.

The original build cache didn’t save stdout/stderr - it just didn’t seem necessary. One unintended effect was that -toolexec=time did not pollute future build output with cached timing prints.

But then go build -gcflags=-m errors didn’t reprint the -m output on a second run (#22587). We fixed that in CL 77110 by caching and reprinting the stdout/stderr too.

But then we still missed stdout/stderr for certain skipped compile+link steps (#23877). We fixed that in CL 128903 with even more special logic.

And now go build -toolexec=time prints the timing output without -toolexec, which is kind of the opposite of the -gcflags=-m issue.

(There were probably other issues I am forgetting.)

Instead of continuing to finesse the exact behavior of when stdout/stderr are cached or not, why not turn back the clock and re-fix #22587 by saying: “don’t cache build artifacts when a command prints anything”. Then #22587 is fixed, #23877 doesn’t happen, this issue doesn’t happen either, -toolexec stays independent of the cache decision, and -toolexec=toolstash stays fast.

And it would let us delete all the output-handling code in the build cache (the test cache is separate and would stay).

Thoughts?

rsc on Oct 1, 2020

No change in consensus, so accepted.

rsc on Oct 21, 2020

I would probably pick “never cache if -toolexec is used”

I vote for this.

cherrymui on Sep 12, 2018

Based on the discussion above, this seems like a likely accept.

rsc on Oct 14, 2020

I agree that this bug needs fixing one way or another. The only way I find -toolexec useful right now is like -a -toolexec=whatever.

I have some thoughts about caching, though. It’s unfortunate if we simply throw caching out the window, because it’s entirely reasonable for toolexec programs to be deterministic given their input.

The way go build handles caching properly for go tool compile seems to be to first ask it compile -V=full, and use that as part of the cache key. This means the compile operations will need re-doing if the compiler version changes, but not otherwise.

Could we do the same for -toolexec as an opt-in? For example, if one does go build -toolexec=mytool, the Go tool would run a well defined version flag like mytool -toolversion, similar to the compiler’s -V=full. If that fails, we do no caching at all with -toolexec=mytool. If it succeeds, its output is added to the cache key, and caching takes place.

I think this is the best of both worlds, because current non-deterministic toolexec programs would continue to use no caching at all, while other Go-specific complex tools could take advantage of the build cache.

We would have to define the bare minimum of information for -toolversion to report, though. For example, would a stringified https://golang.org/pkg/runtime/debug/#Module be enough, if available? Or perhaps a hash of the file contents in https://golang.org/pkg/os/#Executable?

mvdan on Dec 8, 2019