scipy: scipy.stats.chisquare test does not check that observed and expected frequencies add to same total

The function scipy.stats.chisquare performs the chi-square test on a vector of observed and expected frequencies. For the test to make sens, the observed and expected frequency vectors must sum to the same total, otherwise the result is nonsense since the inputs are incompatible. So there are two options: either an error should be thrown if they do not, or f_exp need to be rescaled so they sum to the total of f_obs. The first is probably better, or the second with a warning thrown.

Reproducing code example:

# In the following example, one vector is an exact multiple of the other.
# this means that the observed and expected frequencies are exactly
# proportional.  This should give a p-value of 1 (not significant at all).  
# Instead, you get the following:
from scipy.stats import chisquare
chisquare(f_obs=[10,20], f_exp=[30,60])

# Power_divergenceResult(statistic=40.0, pvalue=2.5396285894708634e-10)
# the statistic of 40 is calculated as following, which directly follows the 
# formula.  This is then plugged into a chi squared distribution to get a 
#p-value of close to 0, which is the opposite of the significance you should
# get.
# ((10-30)**2 /30) + ((20-60)** 2 / 60) = 40

# Instead, here is what should happen
import numpy as np
fobs = np.array([10,20])
fexp = np.array([30,60])
# adjust the totals
# gives array([10., 20.]), the same as observed
fexp = fexp * (np.sum(fobs)/np.sum(fexp)) 
chisquare(f_obs=fobs, f_exp=fexp)
# the correct result
# Power_divergenceResult(statistic=0.0, pvalue=1.0)

Scipy/Numpy/Python version information:

1.4.1 1.18.4 sys.version_info(major=3, minor=7, micro=4, releaselevel='final', serial=0)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (9 by maintainers)

Commits related to this issue

Most upvoted comments

AFAIU changing this was a mistake. The chi-square test as described on Wikipedia https://en.wikipedia.org/wiki/Chi-squared_test does not require that the sum of the observations is equal to the sum of expectations. It is derived on the Wikipedia under the assumption that sum(x) == sum(m), but even without enforcing this condition, if the x are sampled from the m, the statistic has a chi-square distribution in the asymptotic limit. The original implementation was correct and breaking everyone’s code that relied on the previous behavior is not great.

As I said, the test output is nonsensical since the inputs are incompatible because the test requires them to sum to the same total.

This is simply wrong.

The problem is rather the documentation of the chisquare function, which speaks of f_obs and f_exp, but the function did not accept frequencies but counts. Frequencies must sum to 1, but counts do not have to.