pandas: PERF: regression in getattr for IntervalIndex
Master:
In [14]: idx = pd.interval_range(0, 1000, 1000)
In [15]: %timeit getattr(idx, '_ndarray_values', idx)
1.29 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [16]: %timeit idx.closed
321 ns ± 2.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
while on 0.25.3:
In [13]: idx = pd.interval_range(0, 1000, 1000)
In [14]: %timeit getattr(idx, '_ndarray_values', idx)
90.5 ns ± 2.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [15]: %timeit idx.closed
105 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
(just checked a few attributes, didn’t check if it is related to those specific ones or getattr in general)
I think this is a cause / one of the causes of several regressions that can currently be seen at https://pandas.pydata.org/speed/pandas/ (eg https://pandas.pydata.org/speed/pandas/#reshape.Cut.time_cut_timedelta?p-bins=1000&commits=6efc2379-b9de33e3)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 20 (20 by maintainers)
BTW, I don’t think this should necessarily be a blocker for 1.0. But we should keep this in mind for all index->EA delegation refactor that is happening, and look into improving this.