tensorflow: TF 2.0 regression: cloudpickle cannot serialize tf.keras.Sequential.
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes (code included below in the issue)
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.14.3
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
- TensorFlow installed from (source or binary): pip
- TensorFlow version (use command below): v2.0.0-beta1-5101-gc75bb66a99 2.0.0-rc0
- Python version: Python 3.6.7 :: Anaconda, Inc.
- Bazel version (if compiling from source): N/A
- GCC/Compiler version (if compiling from source): N/A
- CUDA/cuDNN version: N/A
- GPU model and memory: N/A
Using cloudpickle to serialize a Python function that uses tf.keras.Sequential fails with a recursion error.
Note that this works with tensorflow==1.14.0.
I imagine it also fails with other things, not just tf.keras.Sequential.
import cloudpickle # cloudpickle.__version__ == '1.2.1'
import tensorflow as tf # tf.__version__ == '2.0.0-rc0'
def f():
tf.keras.Sequential
cloudpickle.loads(cloudpickle.dumps(f)) # This fails.
The last line fails with
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
<ipython-input-23-25cc307e6227> in <module>
----> 1 cloudpickle.loads(cloudpickle.dumps(f))
~/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py in __getattr__(self, item)
48
49 def __getattr__(self, item):
---> 50 module = self._load()
51 return getattr(module, item)
52
~/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py in _load(self)
42 def _load(self):
43 """Import the target module and insert it into the parent's namespace."""
---> 44 module = _importlib.import_module(self.__name__)
45 self._parent_module_globals[self._local_name] = module
46 self.__dict__.update(module.__dict__)
... last 2 frames repeated, from the frame below ...
~/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py in __getattr__(self, item)
48
49 def __getattr__(self, item):
---> 50 module = self._load()
51 return getattr(module, item)
52
RecursionError: maximum recursion depth exceeded while calling a Python object
See https://stackoverflow.com/questions/57750920/ray-tensorflow-gpu-2-0-recursionerror/57761034#57761034
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 27 (3 by maintainers)
Commits related to this issue
- Add __reduce__ method in virtual pip root due to lazy loading Fixes pickling issue #32159 Tested manually: just applied patch to tensorflow nightly and checked required imports. PiperOrigin-RevId: ... — committed to tensorflow/tensorflow by mihaimaruseac 5 years ago
- Add __reduce__ method in virtual pip root due to lazy loading Fixes pickling issue #32159 Tested manually: just applied patch to tensorflow nightly and checked required imports. PiperOrigin-RevId: ... — committed to advaitjain/tensorflow by mihaimaruseac 5 years ago
- Add __reduce__ method in virtual pip root due to lazy loading Fixes pickling issue #32159 Tested manually: just applied patch to tensorflow nightly and checked required imports. PiperOrigin-RevId: ... — committed to ssheikholeslami/tensorflow by mihaimaruseac 5 years ago
@mihaimaruseac I opened https://github.com/tensorflow/tensorflow/pull/39034 for this.
I think this can be closed now as it has been solved and backported to 1.15 too.
Any updates on this?
No, but if you want to make a cherry-pick we can merge it if and when we do a new patch release on 1.15
Seems the fix works with Ray. However if we use custom layers with functions decorated with @tf.function there are still pickling issues. As a workaround for that I figured one could save the model as a “savedmodel” on a distributed storage and then have the ray worker load the model from the distributed storage, but this throws an error.
Note: Removing the LSTM layer does not result in an error, which would suggest that this error is related to the while operation (as the error suggests).
Code to reproduce
Error
@jharaldson the easiest workaround might be the one described in https://github.com/ray-project/ray/issues/5614#issuecomment-527292289.
Another workaround is described in https://stackoverflow.com/a/57761034/7858504
@mihaimaruseac I saw https://github.com/tensorflow/tensorflow/commit/4675891bd3c9e9ee7a57552486ec5bdc40379787 . Is it relevant to this issue?
I have tried on colab with TF 1.14 and able to execute the code.However i am able to reproduce the issue with TF 2.0.0-rc0 and 2.0 nightly versions.Please, find the gist here.Thanks!