scikit-learn: IterativeImputer behaviour on missing nan's in fit data

Why is this behaviour forced:

Features with missing values during transform which did not have any missing values during fit will be imputed with the initial imputation method only.

https://scikit-learn.org/dev/modules/generated/sklearn.impute.IterativeImputer.html#sklearn.impute.IterativeImputer

This means by default it will return the mean of that feature. I would prefer just fit one iteration of the chosen estimator and use that fitted estimator to impute missing values.

Actual behaviour: Example - The second feature missing np.nan --> mean imputation

import numpy as np
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, verbose=0)
imp.fit([[1, 2], [3, 6], [4, 8], [10, 20], [np.nan, 22], [7, 14]])

X_test = [[np.nan, 4], [6, np.nan], [np.nan, 6], [4, np.nan], [33, np.nan]]
print(np.round(imp.transform(X_test)))

Return:
[[ 2.  4.]
 [ 6. 12.]
 [ 3.  6.]
 [ 4. 12.]
 [33. 12.]]

Example adjusted - Second feature has np.nan values --> iterative imputation with estimator

import numpy as np
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, verbose=0)
imp.fit([[1, 2], [3, 6], [4, 8], [10, 20], [np.nan, 22], [7, np.nan]])

X_test = [[np.nan, 4], [6, np.nan], [np.nan, 6], [4, np.nan], [33, np.nan]]
print(np.round(imp.transform(X_test)))

Return:
[[ 2.  4.]
 [ 6. 12.]
 [ 3.  6.]
 [ 4. 8.]
 [33. 66.]]

Maybe sklearn/impute.py line 679 to 683 should be optional with a parameter like force-iterimpute.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 1
Comments: 21 (17 by maintainers)

Most upvoted comments

Thanks Sergey!

jnothman on Sep 9, 2019

Maybe I can take a crack at this.

To review: the change would be to (optionally and by default) to fit regressors on even those features that have no missing values at train time.

At transform, we can then impute them these features if they are missing for any sample.

We will need a new test, and to update the doc string, Maybe the test can come directly from https://github.com/scikit-learn/scikit-learn/issues/14383?

Am I missing anything?

sergeyf on Aug 25, 2019

It’d be nice to make progress on this

jnothman on Aug 25, 2019