scipy: scipy.stats.chisquare is not working
My issue is about … scipy.stats.chisquare is not working since last update
Reproducing code example:
obs_spec=pd.read_fwf(folder+'obs.txt',header=None)
the_spec=pd.read_fwf(folder+'models/m9.dat',header=None)
sps.chisquare(f_obs=obs_spec[1], f_exp=the_spec[1])
Sample code to reproduce the problem
Error message:
Traceback (most recent call last):
ValueError Traceback (most recent call last)
<ipython-input-28-726f4043508e> in <module>
14 obs_spec,the_spec
15
---> 16 a=sps.chisquare(f_obs=obs_spec[1], f_exp=the_spec[1])
17 a
~/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py in chisquare(f_obs, f_exp, ddof, axis)
6851 """
6852 return power_divergence(f_obs, f_exp=f_exp, ddof=ddof, axis=axis,
-> 6853 lambda_="pearson")
6854
6855
~/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py in power_divergence(f_obs, f_exp, ddof, axis, lambda_)
6692 f"of {rtol}, but the percent differences are:\n"
6693 f"{relative_diff}")
-> 6694 raise ValueError(msg)
6695
6696 else:
ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
0.19829509324291156
File "<stdin>", line 1, in <module>
...
Scipy/Numpy/Python version information:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (10 by maintainers)
Yup, and that’s why we added the check : ) Thanks!
Alright folks, the same problem for me. Both sums add up the same number! It gives me an error saying:
Please, at least give us a way to set the tolerance.
I do not understand why is it required to have the expected and observed counts to be the same. The test statistic can be calculated without this restriction ( sum{(observed_i-expected_i)^2/expected_i} ), the distribution table is given, I can not see any reason for this.
You are welcome to use the old code. This was the relevant code in 87641e3.
@sap43 for better feedback, please consider posting your example on Stack Overflow with complete code that others can run to reproduce the issue.
The function doesn’t require normalized bin counts. It only requires the expected and observed counts to be equal.
The argument for the chisquare test given here is wrong, there is no need to normalize the data such that observed and expected frequencies have the same sum. I added my counterargument to https://github.com/scipy/scipy/issues/12282.
Even without normalization, the test statistic is asymptotically chisquare distributed if the hypothesis is true.
The only real question is whether normalizing makes the test statistic converge faster to the asymptotic limit or not. I don’t think so. And in any case, the rate of convergence is not a good enough reason to break code of people downstream who have been relying on the old behavior of this function. Please reopen.