pandas: Confusing (possibly buggy) IntervalIndex behavior
In the above, I have a region that I’m querying for with a partially overlapping interval. The query succeeds when the interval is partially overlapping until it doesn’t, throwing the key error:
KeyError Traceback (most recent call last)
/Users/alex/Documents/GarNet/venv/lib/python3.6/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
1433 if not ax.contains(key):
-> 1434 error()
1435 except TypeError as e:
/Users/alex/Documents/GarNet/venv/lib/python3.6/site-packages/pandas/core/indexing.py in error()
1428 raise KeyError("the label [%s] is not in the [%s]" %
-> 1429 (key, self.obj._get_axis_name(axis)))
1430
KeyError: 'the label [(5409951, 5409965]] is not in the [index]'
I think this is particularly confusing because there doesn’t seem to be any prominent difference between the locs that succeed and the loc that fails as far as I can tell. I know we had discussed loc’s behavior in this context but I’m not sure we came to a conclusion.
By the way, my larger question is about how to find intersections between two IntervalIndex. It seems like the find_intersections function didn’t make it into this release @jreback ? Let me know! =]
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 37 (36 by maintainers)
chiming in on this, as we are heavy users of
postgresrangetypes andrangeoperators as a powerful abstraction for time series dataas already been mentioned, the key verbs are
containsandoverlapsboth on element and range level and in both directions:examples from the postgres docs:
int4range(2,4) @> int4range(2,3)'[2011-01-01,2011-03-01)'::tsrange @> '2011-01-10'::timestampint4range(2,4) <@ int4range(1,7)42 <@ int4range(1,7)int8range(3,7) && int8range(4,12)int8range(1,10) << int8range(100,110)numrange(1.1,2.2) -|- numrange(2.2,3.3)now that we have
Intervals inpandas(very grateful for bringing that feature @jreback!) I have already tinkered around with some mappers for going betweenPostgresandpandas— maybe that is toodb-specific but def have a great interest in seeing moreIntervaltype functionality in Pandas and helping out with thiscame across this library: https://github.com/AlexandreDecan/python-intervals
looks to have some interesting interval semantics
cc @jschendel
An interval covers another interval if all points in the second interval are found in the first interval.
An interval overlaps another interval if there exist any points found in both intervals.