scikit-learn: Benchmark bug in SGDRegressor.fit on sparse data

There is a 10x increase in fit duration as reported here:

https://scikit-learn.org/scikit-learn-benchmarks/#linear_model.SGDRegressorBenchmark.time_fit?commits=e6b46675-b4afbeee&p-representation=‘sparse’

This happened between between e6b46675 (still fast) b4afbeee (slow).

Note that the same estimator with dense data was not impacted.

I have not investigated the cause myself. Just spotted it when reviewing the benchmarks.

Looking at the commit messages of the output of git log e6b46675..b4afbeee, it could be the case that 1a669d8202b1a22750b6e4f74b08257b5b95e0fe introduced the regression but this needs confirmation:

EDIT: this is actually a bug in the design of the benchmark rather than a performance regression. See the discussion below.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (16 by maintainers)

Most upvoted comments

Oh, I see that you put a tolerance tol=1e-16. This cannot work with 32 bits because the smallest tolerance would be around 1e-7.

There’s a confusion here. 1e-16 is representable in float32, and you can check that a number is smaller than that. Machine precision represents the max relative difference between 2 consecutive representable numbers.

If the stopping criterion of sgd was if (new_score / prev_score) < tol, 1e-16 would be fine. However the current stopping criterion is if score < best_score + tol, In that case, as long as tol < best_score * float32.eps, the tol is rounded out and the expression becomes ìf score < best_score.

In the benchmaks the goal is to always perform the max_iter iterations, So probably setting tol=None would be more appropriate.

OK I think the issue is our benchmark here:

https://github.com/scikit-learn/scikit-learn/blob/10eaa52c85cbd4d8745197f03e931ee33eb88c35/asv_benchmarks/benchmarks/linear_model.py#L167

The tolerance is too small for the 32 bits implementation. I don’t know if tol=1e-7 would be a good default there.

Oh, I see that you put a tolerance tol=1e-16. This cannot work with 32 bits because the smallest tolerance would be around 1e-7.