scipy: BUG: breaking change: scipy.stats.mode returned value has changed shape without deprecation

Describe your issue.

The scikit-learn nightly CI started to fail because of a recent change in scipy main:

In scipy main:

import numpy as np
from scipy import stats

stats.mode(np.ones(3))[0]

returns 1.0 while previously it would return:

array([1.])

Maybe a deprecation cycle is needed for this change?

Reproducing Code Example

import numpy as np
import scipy

scipy.stats.mode(np.ones(3))[0][0]

Error message

IndexError: invalid index to scalar variable.

SciPy/NumPy/Python version information

1.10.0.dev0+0.df3fe4e 1.24.0.dev0+102.gb84a53df9 sys.version_info(major=3, minor=10, micro=4, releaselevel=‘final’, serial=0)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (20 by maintainers)

Commits related to this issue

Most upvoted comments

@ogrisel I opened gh-16429 to work on this. It reverts the changes of gh-15423 so that the default behavior of mode is to retain the axis it acts along (as it was), but warns that this will change in SciPy 1.11.0. It hastily adds a keepdims parameter, but the behavior in edge cases is not perfect. I wonder if it would work to just revert gh-15423 and warn about the changing behavior but not offer keepdims. Would that work for sckit-learn?

Sorry, I’m late to this party and missing it because it in my release blockers review since it didn’t have a 1.9.0 milestone.

It seems pretty clear to me that we should revert this for 1.9.0. It’s somewhere in between a regular bug fix and a bc-breaking change, but either way that is pretty much irrelevant here. If we’re breaking scikit-learn, we just shouldn’t and it’s easy enough to give them one release cycle to fix things up.

That would work. We would just have to check the scipy version to call it the right way and silence the warning locally when this function is called in scikit-learn to avoid bothering our users.

@ogrisel it would be great to make the code in scikit-learn robust either way; no version checking is needed, a simple atleast_1d call on the result of that stats.mode call will work in all cases.

Actually we will need to release a scikit-learn 1.1.2 compat bugfix before numpy 1.24 is released anyways because of https://github.com/scikit-learn/scikit-learn/pull/23654.

So feel free to do as you prefer, independently of scikit-learn.

We could do a deprecation cycle, but please see the mailing list post and related issue gh-15423.

Previously, mode retained the dimension it acted along, unlike almost all (all?) other scipy.stats reduction operations. Due to the inconsistency, this behavior was considered a bug, so it changed without a deprecation cycle. A keepdims parameter was added to mitigate the change, but the default is currently False. The change will be documented in the 1.9.0 release notes (see gh-16248 - although it’s in the enhancement section only right now; it would need to be added to the Backward Incompatible changes section).

We could make the default value keepdims=True for mode and raise a warning about a future change in behavior, but that will still probably break dependent library CI, and the changes required would be the same: apply keepdims=True or change indexing.

Opinions here?

Note before this is closed: even if we don’t make a change in behavior here, we need to copy the release note entry about the change into the Backward Incompatibility section.