pandas: DataFrame.interpolate() extrapolates over trailing missing data

See also the discussion at StackOverflow.

Linear interpolation on a series with missing data at the end of the array will overwrite trailing missing values with the last non-missing value. In effect, the function extrapolates rather than strictly interpolating.

Example:

import pandas as pd
import numpy as np

a = pd.Series([np.nan, 1, np.nan, 3, np.nan])
a.interpolate()

Yields (note the extrapolated 4):

0   NaN
1     1
2     2
3     3
4     4
5     4
dtype: float64

not

0   NaN
1     1
2     2
3     3
4     4
5     NaN
dtype: float64

I believe the fix is something along the lines of changing lines 1545:1546 in core/common.py from

result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid], yvalues[firstIndex:][valid])

result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid], yvalues[firstIndex:][valid], np.nan, np.nan)

About this issue

Original URL
State: closed
Created 10 years ago
Reactions: 2
Comments: 17 (14 by maintainers)

Links to this issue

pandas - Extrapolate dataframe rows - Stack Overflow

Commits related to this issue

Correct a typo of version number in the docstring of _shared_docs['interpolate'] which is the docstring for pandas.core.resample.Resampler.interpolate, pandas.DataFrame.interpolate, pandas.Series.inte... — committed to willweil/pandas by willweil 5 years ago
Correct a typo of version number in documentation/user_guide/missing_data about the limit_area keyword argument in interpolate(). The reference can be found at https://github.com/pandas-dev/pandas/iss... — committed to willweil/pandas by willweil 5 years ago

Most upvoted comments

This is definitely a bug. All new panda users will find this behaviour as confusing and error-prone (as I just did). If there is a code that rely on this bug - that’s mean there is a bug in that code also. You should fix it. Interpolate - means interpolate, not extrapolate in any way.

relonger on Nov 12, 2017

Given that the filling of the trailing values does not follow the specified method, but just forward fills, I think we could consider this as a bug. However, of course, still a bug that people could rely upon, so not sure whether we should just change the behaviour.

jorisvandenbossche on Feb 9, 2017

@Jezzamonn One workaround solution: http://stackoverflow.com/questions/25255496/dataframe-interpolate-extrapolates-over-trailing-missing-data/33390872#33390872

jluttine on Dec 17, 2015

yeah it looks like a typo; this change is in 0.23

would love a PR to update!

jreback on Feb 20, 2019