pandas: RecursionError in DataFrame.replace with Out of Bounds Datetime
RecursionError in DataFrame.replace
import pandas as pd
import datetime
df = pd.DataFrame({
"dt" : [datetime.datetime(3017, 12, 20)],
"str" : ["blah"]
})
df.replace("blah", "cats")
Leads to lots of
OutOfBoundsDatetime Traceback (most recent call last)
~/sandbox/pandas/pandas/core/internals.py in replace(self, to_replace, value, inplace, filter, regex, convert, mgr)
804 blocks = [b.convert(by_item=True, numeric=False,
--> 805 copy=not inplace) for b in blocks]
806 return blocks
~/sandbox/pandas/pandas/core/internals.py in <listcomp>(.0)
804 blocks = [b.convert(by_item=True, numeric=False,
--> 805 copy=not inplace) for b in blocks]
806 return blocks
~/sandbox/pandas/pandas/core/internals.py in convert(self, *args, **kwargs)
2355 if by_item and not self._is_single_block:
-> 2356 blocks = self.split_and_operate(None, f, False)
2357 else:
~/sandbox/pandas/pandas/core/internals.py in split_and_operate(self, mask, f, inplace)
508 if m.any():
--> 509 nv = f(m, v, i)
510 else:
~/sandbox/pandas/pandas/core/internals.py in f(m, v, i)
2345 shape = v.shape
-> 2346 values = fn(v.ravel(), **fn_kwargs)
2347 try:
~/sandbox/pandas/pandas/core/dtypes/cast.py in soft_convert_objects(values, datetime, numeric, timedelta, coerce, copy)
834 if datetime:
--> 835 values = lib.maybe_convert_objects(values, convert_datetime=datetime)
836
~/sandbox/pandas/pandas/_libs/src/inference.pyx in pandas._libs.lib.maybe_convert_objects()
1317 seen.datetime_ = 1
-> 1318 idatetimes[i] = convert_to_tsobject(
1319 val, None, None, 0, 0).value
~/sandbox/pandas/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()
299 elif PyDateTime_Check(ts):
--> 300 return convert_datetime_to_tsobject(ts, tz, nanos)
301 elif PyDate_Check(ts):
~/sandbox/pandas/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()
379
--> 380 check_dts_bounds(&obj.dts)
381 check_overflows(obj)
~/sandbox/pandas/pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()
120 dts.min, dts.sec)
--> 121 raise OutOfBoundsDatetime(
122 'Out of bounds nanosecond timestamp: {fmt}'.format(fmt=fmt))
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 3017-12-20 00:00:00
(thanks to @ChrisMuir for the example)
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (9 by maintainers)
In the example given, and perhaps in most use cases, the OOB datetime is irrelevant. Of course that datetime could lead to other problems, but as far as replace is concerned it shouldn’t matter. In my opinion, it would only make sense to raise the exception when manipulating the datetime.
Well, personally I’d prefer that
replace()
not throw an error when it hits an out of bound datetime, but that’s coming from a purely selfish standpoint. I don’t use Pandas a ton, and my use cases probably only represent a fraction of total common use cases (I’m always working with data generated by 3rd parties to which I’m completely disconnected). Also, I don’t know enough about the Pandas internals and philosophy to weigh-in on which option makes the most sense, with all users in mind.