ray: `pandas has no attribute 'compat'` Deserialization bug when running tasks very rarely
What is the problem?
pandas has no attribute 'compat gets thrown on deserialization, see traceback below.
Ray version and other system information (Python version, TensorFlow version, OS): 0.8.3 and later including latest wheels.
I tried to include this PR #7406, which was reverted in #7437. This did not fix the issue.
Reproduction (REQUIRED)
This will require external dependencies because it’s a bug with serialization of the pandas library. It is not reproducible in every environment, though we have a way to reproduce it regularly in Modin’s CI.
2020-04-02 14:59:40,210 INFO resource_spec.py:212 -- Starting Ray with 200.0 GiB memory available for workers and up t 200.0 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-04-02 14:59:40,554 INFO services.py:1123 -- View the Ray dashboard at localhost:8265
2020-04-02 14:59:40,557 WARNING services.py:1455 -- WARNING: object_store_memory is not verified when plasma_directoryis set.
Running on Modin on Ray with tmp directory /path and memory 214748364800
Traceback (most recent call last):
File "/modin/modin/engines/ray/pandas_on_ray/frame/partition.py", line 46, in get
return ray.get(self.oid)
File "/path/to/python3.7/site-packages/ray/worker.py", line 1502, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::IDLE (pid=81352, ip=XXX.XXX.XXX.XXX)
File "python/ray/_raylet.pyx", line 433, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 434, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 312, in ray._raylet.deserialize_args
ray.exceptions.RayTaskError: ray::IDLE (pid=81333, ip=XXX.XXX.XXX.XXX)
File "python/ray/_raylet.pyx", line 433, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 434, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 306, in ray._raylet.deserialize_args
File "/path/to/python3.7/site-packages/ray/serialization.py", line 323, in deserialize_objcts
self._deserialize_object(data, metadata, object_id))
File "/path/to/python3.7/site-packages/ray/serialization.py", line 271, in _deserialize_obect
return self._deserialize_pickle5_data(data)
File "/path/to/python3.7/site-packages/ray/serialization.py", line 260, in _deserialize_pikle5_data
obj = pickle.loads(in_band, buffers=buffers)
File "/path/to/python3.7/site-packages/pandas/__init__.py", line 193, in <module>
if pandas.compat.PY37:
AttributeError: module 'pandas' has no attribute 'compat'
Some observations about the bug:
- We can only see it happen on >100 core machines, and it happens rarely - may be some concurrency issue that only happens sometimes.
- Happens when we run Ray tests on CI in parallel using
pytest-xdistwith >40 workers or so. Below that it doesn’t really fail. We start Ray with 1 CPU in these conditions. - again points to concurrency issue for deserialization - The error message is not always the same, but it is always an import error, for example:
ImportError: cannot import name 'DataFrame' from 'pandas.core.frame' (/path/to/python3.7/site-packages/pandas/core/frame.py) - We can even see some error such as
int is not callable.
cc @gshimansky
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 28 (22 by maintainers)
I don’t know if the error persists, I haven’t used PBT in a long time. I suggest to close the issue for now, if it occurs again we can always re-open 😃