pandas: Confusing (possibly buggy) IntervalIndex behavior

screen shot 2017-05-09 at 10 03 25 pm

In the above, I have a region that I’m querying for with a partially overlapping interval. The query succeeds when the interval is partially overlapping until it doesn’t, throwing the key error:

KeyError                                  Traceback (most recent call last)
/Users/alex/Documents/GarNet/venv/lib/python3.6/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1433                 if not ax.contains(key):
-> 1434                     error()
   1435             except TypeError as e:

/Users/alex/Documents/GarNet/venv/lib/python3.6/site-packages/pandas/core/indexing.py in error()
   1428                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1429                                (key, self.obj._get_axis_name(axis)))
   1430 

KeyError: 'the label [(5409951, 5409965]] is not in the [index]'

I think this is particularly confusing because there doesn’t seem to be any prominent difference between the locs that succeed and the loc that fails as far as I can tell. I know we had discussed loc’s behavior in this context but I’m not sure we came to a conclusion.

By the way, my larger question is about how to find intersections between two IntervalIndex. It seems like the find_intersections function didn’t make it into this release @jreback ? Let me know! =]

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 37 (36 by maintainers)

Most upvoted comments

chiming in on this, as we are heavy users of postgres range types and range operators as a powerful abstraction for time series data

as already been mentioned, the key verbs are contains and overlaps both on element and range level and in both directions:

examples from the postgres docs:

Operator Description Example Result
@> contains range int4range(2,4) @> int4range(2,3) t
@> contains element '[2011-01-01,2011-03-01)'::tsrange @> '2011-01-10'::timestamp t
<@ range is contained by int4range(2,4) <@ int4range(1,7) t
<@ element is contained by 42 <@ int4range(1,7) f
&& overlap (have points in common) int8range(3,7) && int8range(4,12) t
<< strictly left of int8range(1,10) << int8range(100,110) t
-|- is adjacent to numrange(1.1,2.2) -|- numrange(2.2,3.3) t

now that we have Intervals in pandas (very grateful for bringing that feature @jreback!) I have already tinkered around with some mappers for going between Postgres and pandas — maybe that is too db-specific but def have a great interest in seeing more Interval type functionality in Pandas and helping out with this

came across this library: https://github.com/AlexandreDecan/python-intervals

looks to have some interesting interval semantics

cc @jschendel

could you re-disambiguate what covers and overlaps each do in the proposed function signatures you posted above? overlaps is “all overlaps” and covers is “any overlaps”?

An interval covers another interval if all points in the second interval are found in the first interval.

An interval overlaps another interval if there exist any points found in both intervals.