pandas: RecursionError in DataFrame.replace with Out of Bounds Datetime

RecursionError in DataFrame.replace

import pandas as pd
import datetime
df = pd.DataFrame({
    "dt" : [datetime.datetime(3017, 12, 20)], 
    "str" : ["blah"]
})
df.replace("blah", "cats")

Leads to lots of

OutOfBoundsDatetime                       Traceback (most recent call last)
~/sandbox/pandas/pandas/core/internals.py in replace(self, to_replace, value, inplace, filter, regex, convert, mgr)
    804                 blocks = [b.convert(by_item=True, numeric=False,
--> 805                                     copy=not inplace) for b in blocks]
    806             return blocks

~/sandbox/pandas/pandas/core/internals.py in <listcomp>(.0)
    804                 blocks = [b.convert(by_item=True, numeric=False,
--> 805                                     copy=not inplace) for b in blocks]
    806             return blocks

~/sandbox/pandas/pandas/core/internals.py in convert(self, *args, **kwargs)
   2355         if by_item and not self._is_single_block:
-> 2356             blocks = self.split_and_operate(None, f, False)
   2357         else:

~/sandbox/pandas/pandas/core/internals.py in split_and_operate(self, mask, f, inplace)
    508             if m.any():
--> 509                 nv = f(m, v, i)
    510             else:

~/sandbox/pandas/pandas/core/internals.py in f(m, v, i)
   2345             shape = v.shape
-> 2346             values = fn(v.ravel(), **fn_kwargs)
   2347             try:

~/sandbox/pandas/pandas/core/dtypes/cast.py in soft_convert_objects(values, datetime, numeric, timedelta, coerce, copy)
    834     if datetime:
--> 835         values = lib.maybe_convert_objects(values, convert_datetime=datetime)
    836

~/sandbox/pandas/pandas/_libs/src/inference.pyx in pandas._libs.lib.maybe_convert_objects()
   1317                     seen.datetime_ = 1
-> 1318                     idatetimes[i] = convert_to_tsobject(
   1319                         val, None, None, 0, 0).value

~/sandbox/pandas/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()
    299     elif PyDateTime_Check(ts):
--> 300         return convert_datetime_to_tsobject(ts, tz, nanos)
    301     elif PyDate_Check(ts):

~/sandbox/pandas/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()
    379
--> 380     check_dts_bounds(&obj.dts)
    381     check_overflows(obj)

~/sandbox/pandas/pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()
    120                                                dts.min, dts.sec)
--> 121         raise OutOfBoundsDatetime(
    122             'Out of bounds nanosecond timestamp: {fmt}'.format(fmt=fmt))

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 3017-12-20 00:00:00


(thanks to @ChrisMuir for the example)

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

In the example given, and perhaps in most use cases, the OOB datetime is irrelevant. Of course that datetime could lead to other problems, but as far as replace is concerned it shouldn’t matter. In my opinion, it would only make sense to raise the exception when manipulating the datetime.

Well, personally I’d prefer that replace() not throw an error when it hits an out of bound datetime, but that’s coming from a purely selfish standpoint. I don’t use Pandas a ton, and my use cases probably only represent a fraction of total common use cases (I’m always working with data generated by 3rd parties to which I’m completely disconnected). Also, I don’t know enough about the Pandas internals and philosophy to weigh-in on which option makes the most sense, with all users in mind.