pandas: OverflowError: Python int too large to convert to C long
Code Sample, a copy-pastable example if possible
import pandas
content = open('failing_pandas.json').readline()
pd = pandas.read_json(content, lines=True)
Problem description
This issue happens on 0.21.1
+ and doesn’t happen on 0.21.0
for instance. I also tried it using the last master branch 0.23.0 and got the same issue :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 366, in read_json
return json_reader.read()
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 464, in read
self._combine_lines(data.split('\n'))
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 484, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 582, in parse
self._try_convert_types()
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 838, in _try_convert_types
lambda col, c: self._try_convert_data(col, c, convert_dates=False))
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 818, in _process_converter
new_data, result = f(col, c)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 838, in <lambda>
lambda col, c: self._try_convert_data(col, c, convert_dates=False))
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 652, in _try_convert_data
new_data = data.astype('int64')
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/util/_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/generic.py", line 4004, in astype
**kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 3462, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 3329, in apply
applied = getattr(b, f)(**kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 544, in astype
**kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 625, in _astype
values = astype_nansafe(values.ravel(), dtype, copy=True)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/dtypes/cast.py", line 692, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 854, in pandas._libs.lib.astype_intsafe
File "pandas/_libs/src/util.pxd", line 91, in util.set_value_at_unsafe
OverflowError: Python int too large to convert to C long
Expected Output
It should not crash …
Output of pd.show_versions()
Here is the one working :
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.21.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
And one failing :
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.21.1
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 26 (19 by maintainers)
take
I’d like to fix this in the
ujson
implementation similarly to #34473@mondaysunrise you are commenting on an issue about json parsing
you cannot hold these large ints directly and must use object dtype on the Series you are constructing
Still got this error today
Python int too large to convert to C ssize_t
Code to reproduce
@cscetbon : Thanks for this! That is indeed strange.