pandas: DataFrame.interpolate() extrapolates over trailing missing data
See also the discussion at StackOverflow.
Linear interpolation on a series with missing data at the end of the array will overwrite trailing missing values with the last non-missing value. In effect, the function extrapolates rather than strictly interpolating.
Example:
import pandas as pd
import numpy as np
a = pd.Series([np.nan, 1, np.nan, 3, np.nan])
a.interpolate()
Yields (note the extrapolated 4):
0 NaN
1 1
2 2
3 3
4 4
5 4
dtype: float64
not
0 NaN
1 1
2 2
3 3
4 4
5 NaN
dtype: float64
I believe the fix is something along the lines of changing lines 1545:1546 in core/common.py from
result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid], yvalues[firstIndex:][valid])
to
result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid], yvalues[firstIndex:][valid], np.nan, np.nan)
About this issue
- Original URL
- State: closed
- Created 10 years ago
- Reactions: 2
- Comments: 17 (14 by maintainers)
Links to this issue
Commits related to this issue
- Correct a typo of version number in the docstring of _shared_docs['interpolate'] which is the docstring for pandas.core.resample.Resampler.interpolate, pandas.DataFrame.interpolate, pandas.Series.inte... — committed to willweil/pandas by willweil 5 years ago
- Correct a typo of version number in documentation/user_guide/missing_data about the limit_area keyword argument in interpolate(). The reference can be found at https://github.com/pandas-dev/pandas/iss... — committed to willweil/pandas by willweil 5 years ago
This is definitely a bug. All new panda users will find this behaviour as confusing and error-prone (as I just did). If there is a code that rely on this bug - that’s mean there is a bug in that code also. You should fix it. Interpolate - means interpolate, not extrapolate in any way.
Given that the filling of the trailing values does not follow the specified method, but just forward fills, I think we could consider this as a bug. However, of course, still a bug that people could rely upon, so not sure whether we should just change the behaviour.
@Jezzamonn One workaround solution: http://stackoverflow.com/questions/25255496/dataframe-interpolate-extrapolates-over-trailing-missing-data/33390872#33390872
yeah it looks like a typo; this change is in 0.23
would love a PR to update!