pandas: Various methods don't call call __finalize__
Improve coverage of NDFrame.__finalize__
Pandas uses NDFrame.__finalize__ to propagate metadata from one NDFrame to
another. This ensures that things like self.attrs and self.flags are not
lost. In general we would like that any operation that accepts one or more
NDFrames and returns an NDFrame should propagate metadata by calling
__finalize__.
The test file at https://github.com/pandas-dev/pandas/blob/master/pandas/tests/generic/test_finalize.py attempts to be an exhaustive suite of tests for all these cases. However there are many tests currently xfailing, and there are likely many APIs not covered.
This is a meta-issue to improve the use of __finalize__. Here’s a hopefully
accurate list of methods that don’t currently call finalize.
Some general comments around finalize
- We don’t have a good sense for what should happen to
attrswhen there are multiple NDFrames involved with differing attrs (e.g. in concat). The safest approach is to probably drop the attrs when they don’t match, but this will need some thought. - We need to be mindful of performance.
__finalize__can be somewhat expensive so we’d like to call it exactly once per user-facing method. This can be tricky for things likeDataFrame.applywhich is sometimes used internally. We may need to refactor some methods to have a user-facingDataFrame.applythat calls an internalDataFrame._apply. The internal method would not call__finalize__, just the user-facingDataFrame.applywould.
If you’re interested in working on this please post a comment indicating which method you’re working on. Un-xfail the test, then update the method to pass the test. Some of these will be much more difficult to work on than others (e.g. groupby is going to be difficult). If you’re unsure whether a particular method is likely to be difficult, ask first.
-
DataFrame.__getitem__with a scalar -
DataFrame.evalwithengine="numexpr" -
DataFrame.duplicated -
DataFrame.add,mul, etc. (at least for most things; some work to do on conflicts / overlapping attrs in binops) -
DataFrame.combine,DataFrame.combine_first -
DataFrame.update -
DataFrame.pivot,pivot_table -
DataFrame.stack -
DataFrame.unstack -
DataFrame.explodehttps://github.com/pandas-dev/pandas/pull/46629 -
DataFrame.melthttps://github.com/pandas-dev/pandas/pull/46648 -
DataFrame.diff -
DataFrame.applymap -
DataFrame.append -
DataFrame.merge -
DataFrame.cov -
DataFrame.corrwith -
DataFrame.count -
DataFrame.nunique -
DataFrame.idxmax,idxmin -
DataFrame.mode -
DataFrame.quantile(scalar and list of quantiles) -
DataFrame.isin -
DataFrame.pop -
DataFrame.squeeze -
Series.abs -
DataFrame.get -
DataFrame.round -
DataFrame.convert_dtypes -
DataFrame.pct_change -
DataFrame.transform -
DataFrame.apply -
DataFrame.any,sum,std,mean, etdc. -
Series.str.operations returning a Series / DataFrame -
Series.dt.operations returning a Series / DataFrame -
Series.cat.operations returning a Series / DataFrame - All groupby operations (at least some work)
-
.iloc/.lochttps://github.com/pandas-dev/pandas/pull/46101
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 3
- Comments: 47 (36 by maintainers)
Commits related to this issue
- API: Call finalize in more Series methods [WIP] Progress towards https://github.com/pandas-dev/pandas/issues/28283. This calls `finalize` for all the public series methods where I think it makes sens... — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- API/TST: Call __finalize__ in more places Progress towards https://github.com/pandas-dev/pandas/issues/28283. This adds tests that ensures `NDFrame.__finalize__` is called in more places. Thus far I'... — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- API/TST: Call __finalize__ in more places Progress towards https://github.com/pandas-dev/pandas/issues/28283. This adds tests that ensures `NDFrame.__finalize__` is called in more places. Thus far I'... — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- API/TST: Call __finalize__ in more places Progress towards https://github.com/pandas-dev/pandas/issues/28283. This adds tests that ensures `NDFrame.__finalize__` is called in more places. Thus far I'... — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- BUG: Fixed Series.abs not propagating metadata xref #28283 — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- BUG: Fixed Series.abs not calling finalize xref #28283 — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- BUG: Fixed Series.abs calling finalize xref #28283 — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- DOC: Call __finalize__ in NDFrame.abs xref #28283 — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- Call finalize in Series.dt xref #28283 — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- Call finalize in Series.dt xref #28283 — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- Call finalize in Series.dt (#36554) xref #28283 — committed to pandas-dev/pandas by TomAugspurger 4 years ago
- Call finalize in Series.str xref #28283 — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- Call finalize in Series.str xref #28283 — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- Fixed metadata propagation in DataFrame.__getitem__ xref #28283 — committed to TomAugspurger/pandas by TomAugspurger 4 years ago
- Call finalize in Series.dt (#36554) xref #28283 — committed to kesmit13/pandas by TomAugspurger 4 years ago
- Fixed metadata propagation in Dataframe.apply (issue #28283) (#44041) — committed to pandas-dev/pandas by moha-rk 3 years ago
- Fixed metadata propagation in Dataframe.idxmax and Dataframe.idxmin #28283 — committed to SomtochiUmeh/pandas by SomtochiUmeh 2 years ago
- Fixed metadata propagation in Dataframe.idxmax and Dataframe.idxmin #… (#47821) * Fixed metadata propagation in Dataframe.idxmax and Dataframe.idxmin #28283 * fixing PEP 8 issues * removing unn... — committed to pandas-dev/pandas by SomtochiUmeh 2 years ago
Hello. I’d like to contribute to this issue. First, i chose the
Series.absfunction, but after running some tests, I saw that this method propagates the Series attributes without using__finalize__. I think that the numpy functionnp.abscreates a copy of the object before calculating the absolute value of the Series entries. Here’s an execution example:So I guess
__finalize__isn’t necessary here, unless I’m wrong. Now I’m working onDataFrame.pop, and will soon submit a PR..quantilecan be ticked off as well. Done as of #47183