pandas: BUG: Indexes still include values that have been deleted
Using pandas 0.10. If we create a Dataframe with a multi-index, then delete all the rows with value X, we’d expect the index to no longer show value X. But it does. Note the apparent inconsistency between “index” and “index.levels” – one shows the values have been deleted but the other doesn’t.
import pandas
x = pandas.DataFrame([['deleteMe',1, 9],['keepMe',2, 9],['keepMeToo',3, 9]], columns=['first','second', 'third'])
x = x.set_index(['first','second'], drop=False)
x = x[x['first'] != 'deleteMe'] #Chop off all the 'deleteMe' rows
print x.index #Good: Index no longer has any rows with 'deleteMe'. But....
print x.index.levels #Bad: index still shows the "deleteMe" values are there. But why? We deleted them.
x.groupby(level='first').sum() #Bad: it's creating a dummy row for the rows we deleted!
We don’t want the deleted values to show up in that groupby. Can we eliminate them?
About this issue
- Original URL
- State: closed
- Created 11 years ago
- Comments: 35 (24 by maintainers)
Commits related to this issue
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
- support for removing unused levels (internally) xref #2770 — committed to jreback/pandas by jreback 7 years ago
I think this can be closed: the default behavior is as intended, and the method
MultiIndex.remove_unused_levels()has been added as a simple fix for whoever doesn’t like the default behavior.The pandas API doesn’t fit in my head anymore. For reference
df.index.get_level_valuesmight be relevent for whatever use case this was a problem for. DOes the right thing.@robertmuil
sorry, forgot to respond to you.
Here is an easy way to do this
create the new frame (FYI in general doing things
inplaceIMHO is confusing to the user and doesn’t help with speed at all)This returns a new frame (You can assign alternatively if you want)
This is pretty cheap to do (though not completely free).
I suppose you could add this as an option to
dropif you’d like. (and I would sayreindexwould be a fine kw for this).like to do a pull-request?