go: cmd/vet: flag benchmarks that don’t use b

I propose that cmd/vet flag any benchmark that doesn’t have any reference to b in its body.

  • Frequency: Low among experienced gophers, but not an uncommon mistake among new gophers.
  • Correctness: Benchmarks written without using b do not function as benchmarks. Often this gets caught by a user who is puzzled by the output, but not always, particularly if the benchmark is slow.
  • Precision: 100%, or very near it. You cannot write a correct benchmark without using b.N, b.Run, b.RunParallel, or somehow passing b or one of its fields to another function to do that for you.

The implementation should be fairly straightforward. Pick out benchmarks. Pick out the identifier (usually b). Look for any uses of that identifier. If none, complain.

cc @robpike @mvdan

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 9
  • Comments: 15 (14 by maintainers)

Most upvoted comments

We do vet API usage, checking things like printf format strings. And we vet test code, checking things like t.Fatalf’s format strings. In general it seems OK to vet misuse of test APIs, same as other APIs, provided the checks meet the same bar.

That said, I am not convinced this specific check needs to be in vet. Checking “did it use b at all” would catch some but not all problems and would be easily defeated. It doesn’t seem specific enough to me. And I think the mistake can be detected more precisely at runtime instead.

As Josh noted, we’ve talked before about having a linearity check. That can be a bit sensitive to timings but might be worth doing anyway. Whether to do that is maybe borderline pending experiments (that none of us have ever made time for).

But a highly accurate linearity check is not needed to detect “this line (plotting t against b.N) has slope 0 instead of slope 1”. I sent CL 230978. With that change, a vet check doesn’t seem worthwhile. (The only benefit would be that the vet check would catch the bug in a benchmark that is never actually run, but if you never run your benchmark, do you really care if it has a bug?)

It seems a minor problem and opening vet up to analyzing people’s tests leads us down a long and slippery path.

A Benchmark function that doesn’t depend on b.N at all is guaranteed not to produce valid results.

In contrast, a Test function that ignores its *testing.T could potentially be valid: it might indicate a test whose failure mode is (say) an uncaught panic that terminates the test program with a non-zero status.

So I don’t think there is a slippery slope here: the argument for Benchmark functions already does not apply to Test functions.

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

Austin pointed out various problems with CL 230978 so I ended up writing a vet check after all. Results of running that vet check on the public Go ecosystem are at https://gist.github.com/rsc/4ba2eacb2d1a00339f4539e7b1bee576. There is one false positive I need to correct involving assigning b to a global that is then read by another function called via yet another function, but other than that, the check seems quite accurate and useful.

@josharian your thoughts are very interesting - you’ve clearly been thinking about this for a while 😃

I agree that, in general, writing good benchmarks in Go requires reading quite a lot of content and using multiple tools together carefully. Even though it’s a difficult task, I still think that the only way to truly solve that problem is to make the tooling better. Be it by making the testing package smarter, or bundling benchstat into testing.

To me, fuzzing and benchmarking are kind of similar, in the sense that they are very helpful and necessary for many developers, but they are difficult to use correctly simply because not enough hours have gone into properly including them as part of go test.

Adding small checks to vet could possibly help in the near term, but we can only catch the most trivial mistakes there. So while this could meet the bar for inclusion in vet, I don’t think it’s the direction we want to be heading in.