statsmodels: [Bug/Doc] lowess returns nan and does not warn if there are too few neighbors

Statsmodels version: 0.6.1 Issue: when the # of unique values in X is less than the number of unique neighbors needed to compute LOWESS, nans are returned but no error or warning is given. Expected behavior: a warning should be raised that LOWESS cannot be computed due to lack of sufficient unique neighbors.

(Workaround: you could jitter all the x values by some tiny epsilon to create more unique values, and then rerun lowess(y, x).)

import statsmodels as sm
print sm.__version__ # '0.6.1'
from sm.nonparametric.smoothers_lowess import lowess
import numpy as np

# initializing data
x = np.random.choice(range(0, 3), 100) # only 3 unique values: {0, 1, 2}
y = np.random.choice(np.arange(0, 1, 0.1), 100)

preds = lowess(y, x)[:, 1] # Slicing to get the predicted y values
# no warning or exception raised, despite the fact that all predictions are nan
print preds
>>> array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan])

# Trying again, with a few more unique x values
x = np.random.choice(range(0, 5), 100) # now, with 5 unique values
y = np.random.choice(np.arange(0, 1, 0.1), 100)
print preds
>>> array([ 0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.48931377,  0.48931377,
        0.48931377,  0.48931377,  0.48931377,  0.48931377,  0.48931377,
        0.48931377,  0.48931377,  0.48931377,  0.48931377,  0.48931377,
        0.48931377,  0.48931377,  0.48931377,  0.48931377,  0.48931377])

About this issue

Original URL
State: open
Created 9 years ago
Comments: 18 (10 by maintainers)

Commits related to this issue

BUG: fix lowess spikes/nans from epsilon values Resolves #7700 and #2449 — committed to tgbrooks/statsmodels by tgbrooks 3 years ago

Most upvoted comments

AFAIU: p_i_j is the projection matrix x (x'x)^{-1} x' specialized to the case with a single regressor and constant, with the addition for weights as in WLS.

(for example: scipy linregress is/was doing something similar working directly with summ of squares or cross products)

(I haven’t looked at those special cases in a long time, and didn’t check the details here. the ols version would be
y_hat = x @ pinv(x) @ y)

josef-pkt on May 21, 2021