pandas: REGR: __setitem__ with integer slices on Int/RangeIndex is broken (label instead of positional)

There’s an backward incompatible change in pandas 1.0 that I didn’t find in the changelog. I might have just overlooked it though.

import numpy as np
X = pd.DataFrame(np.zeros((100, 1)))
X[-4:] = 1
X

In pandas 0.25.3 or lower, this results in the last four entries of X to be 1 and all the others zero. In pandas 1.0, it results in all entries of X being 1. I assume it’s a change of indexing axis 0 or axis 1?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (12 by maintainers)

Commits related to this issue

Most upvoted comments

This is caused by https://github.com/pandas-dev/pandas/pull/27383 I think (cc @jbrockmendel ), specifically:

     def _setitem_slice(self, key, value):
         self._check_setitem_copy()
-        self.loc._setitem_with_indexer(key, value)
+        self.loc[key] = value

maybe but indexing with an out or range label on both sides should return nothing

This is about positional indexing, so there is no “out of range label”. The -4 means start from the fourth last element to the end.

Again, I agree this is surprising behaviour. You would think it is label-based indexing, but it is not. I already described this 5 years in ago #9595.

Some examples to illustrate this:

In [21]: df = pd.DataFrame({'a': [0., 1., 2., 3.]}, index=[2, 3, 4, 5])

In [22]: df 
Out[22]: 
     a
2  0.0
3  1.0
4  2.0
5  3.0

In [23]: df[2:] 
Out[23]: 
     a
4  2.0
5  3.0

In [24]: df[:3]  
Out[24]: 
     a
2  0.0
3  1.0
4  2.0

This those examples are for __getitem__, and work clearly positionally if you look at the index of the results (and both on 0.25 and 1.0, and for both Int64Index as RangeIndex). And so it is __setitem__ is broken in 1.0.0.

Thanks for the report.

Seems this doesn’t affect .iloc:

In [26]: import numpy as np 
    ...: X = pd.DataFrame(np.zeros((5, 1))) 
    ...: X.iloc[-4:] = 1 
    ...: X                                                                      
Out[26]: 
     0
0  0.0
1  1.0
2  1.0
3  1.0
4  1.0

will look into it

I wonder if it’s related to #31449 but I’m not using a multi-index.