statsmodels: [Bug/Doc] lowess returns nan and does not warn if there are too few neighbors
Statsmodels version: 0.6.1 Issue: when the # of unique values in X is less than the number of unique neighbors needed to compute LOWESS, nans are returned but no error or warning is given. Expected behavior: a warning should be raised that LOWESS cannot be computed due to lack of sufficient unique neighbors.
(Workaround: you could jitter all the x values by some tiny epsilon to create more unique values, and then rerun lowess(y, x).)
import statsmodels as sm
print sm.__version__ # '0.6.1'
from sm.nonparametric.smoothers_lowess import lowess
import numpy as np
# initializing data
x = np.random.choice(range(0, 3), 100) # only 3 unique values: {0, 1, 2}
y = np.random.choice(np.arange(0, 1, 0.1), 100)
preds = lowess(y, x)[:, 1] # Slicing to get the predicted y values
# no warning or exception raised, despite the fact that all predictions are nan
print preds
>>> array([ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan])
# Trying again, with a few more unique x values
x = np.random.choice(range(0, 5), 100) # now, with 5 unique values
y = np.random.choice(np.arange(0, 1, 0.1), 100)
print preds
>>> array([ 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 ,
0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 ,
0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 ,
0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 ,
0.4558193 , 0.4558193 , 0.42940657, 0.42940657, 0.42940657,
0.42940657, 0.42940657, 0.42940657, 0.42940657, 0.42940657,
0.42940657, 0.42940657, 0.42940657, 0.42940657, 0.42940657,
0.42940657, 0.42940657, 0.42940657, 0.42940657, 0.42940657,
0.42940657, 0.42940657, 0.42940657, 0.40486366, 0.40486366,
0.40486366, 0.40486366, 0.40486366, 0.40486366, 0.40486366,
0.40486366, 0.40486366, 0.40486366, 0.40486366, 0.40486366,
0.40486366, 0.40486366, 0.40486366, 0.40486366, 0.40486366,
0.40486366, 0.40486366, 0.40486366, 0.41932243, 0.41932243,
0.41932243, 0.41932243, 0.41932243, 0.41932243, 0.41932243,
0.41932243, 0.41932243, 0.41932243, 0.41932243, 0.41932243,
0.41932243, 0.41932243, 0.41932243, 0.41932243, 0.41932243,
0.41932243, 0.41932243, 0.41932243, 0.48931377, 0.48931377,
0.48931377, 0.48931377, 0.48931377, 0.48931377, 0.48931377,
0.48931377, 0.48931377, 0.48931377, 0.48931377, 0.48931377,
0.48931377, 0.48931377, 0.48931377, 0.48931377, 0.48931377])
About this issue
- Original URL
- State: open
- Created 9 years ago
- Comments: 18 (10 by maintainers)
Commits related to this issue
- BUG: fix lowess spikes/nans from epsilon values Resolves #7700 and #2449 — committed to tgbrooks/statsmodels by tgbrooks 3 years ago
AFAIU: p_i_j is the projection matrix
x (x'x)^{-1} x'specialized to the case with a single regressor and constant, with the addition for weights as in WLS.(for example: scipy linregress is/was doing something similar working directly with summ of squares or cross products)
(I haven’t looked at those special cases in a long time, and didn’t check the details here. the ols version would be
y_hat = x @ pinv(x) @ y)