pydantic: Unable to `cloudpickle` Pydantic model classes
Initial Checks
- I confirm that I’m using Pydantic V2
Description
cloudpickle
cannot serialize Pydantic model classes. It fails with a TypeError: cannot pickle 'pydantic_core._pydantic_core.SchemaSerializer' object
exception.
Example Code
# bug.py
"""Cloudpickling Pydantic models raises an exception."""
from pydantic import BaseModel
import ray.cloudpickle as cloudpickle
class SimpleModel(BaseModel):
val: int
cloudpickle.dumps(SimpleModel)
"""
Output:
% python bug.py
Traceback (most recent call last):
File "/Users/shrekris/Desktop/scratch/dump4.py", line 9, in <module>
cloudpickle.dumps(SimpleModel)
File "/Users/shrekris/Desktop/ray/python/ray/cloudpickle/cloudpickle_fast.py", line 88, in dumps
cp.dump(obj)
File "/Users/shrekris/Desktop/ray/python/ray/cloudpickle/cloudpickle_fast.py", line 733, in dump
return Pickler.dump(self, obj)
TypeError: cannot pickle 'pydantic_core._pydantic_core.SchemaSerializer' object
"""
Python, Pydantic & OS Version
/Users/shrekris/miniforge3/envs/pydantic-fix/lib/python3.9/site-packages/pydantic/_migration.py:275: UserWarning: `pydantic.utils:version_info` has been moved to `pydantic.version:version_info`.
warnings.warn(f'`{import_path}` has been moved to `{new_location}`.')
pydantic version: 2.0.3
pydantic-core version: 2.3.0 release build profile
install path: /Users/shrekris/miniforge3/envs/pydantic-fix/lib/python3.9/site-packages/pydantic
python version: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:38:11) [Clang 14.0.6 ]
platform: macOS-11.4-arm64-arm-64bit
optional deps. installed: ['typing-extensions']
Selected Assignee: @lig
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 31 (18 by maintainers)
I believe we are going to release 2.5 beta today, with a view to a final 2.5 production release next week.
@edoakes sorry for the delay I was just able to test the updated packages, and I can confirm it worked perfectly for my use case 😃 Thanks a lot!
I don’t think
SchemaSerializer
needs to be picklable, you should just store the core scheme and recreate the serializer.Reading from https://github.com/pydantic/pydantic/issues/8028 there seems to be a couple of items still pending before 2.5 comes out
@davidhewitt any update on when the next
pydantic
release is going to come out? This is a pretty big pain point for our users and I want to make sure we can unpin the dependency soon.No specific ETA, I would assume 2.5.
Thanks for the help @davidhewitt 🚀 do you know when the next release is scheduled and what its version tag will be?
Both PRs linked above are now merged. I’ve begun manually testing and they appear to address the issue for all of my use cases. @jrapin if you could test out your workflow with
pydantic_core
andpydantic
installed from main that would be helpful.@davidhewitt Please let me know when a
pydantic_core
release can be done in order to add integration testing & catch the nextpydantic
release.I have opened PRs for each of the above issues:
I will begin testing that these fixes are comprehensive using locally installed copies. Once these PRs have been merged and a new version of
pydantic_core
has been released, I will add integration/regression tests to thepydantic
repo.Yes exactly, we can release
pydantic_core
shortly after your PR is merged so there’s not a long delay in getting this working onpydantic
main.@davidhewitt I’d prefer to merge the functionality into
pydantic
so thatcloudpickle
“just works” out of the box and folks don’t have to worry about patching things.Storing the members and defining
__reduce__
onSchemaSerializer
itself would indeed be preferable, I’m just not sure how to accomplish that using PyO3 (not familiar with the framework). I can try to get it working.So then the plan of action would be:
pydantic_core
): MakeSchemaSerializer
directly cloudpickleable inpydantic_core
by storing references to the constructor arguments.pydantic
): Use aWeakrefWrapper
similar to the above rather than the weakref directly.With these two, we should be good to go. Given that these changes would be split across the repos, is there any special versioning story between them? Or does
pydantic
treatpydantic_core
like any other Python package dependency? In PR 2 I’ll add a regression test but that will depend on using a version ofpydantic_core
that includes PR 1.I also ran into the above issues:
SchemaSerializer
not beingcloudpickle
able due to being a native type (written in Rust)._PydanticWeakref
not beingcloudpickle
able due to inheriting fromweakref.ref
, which has known issues with serialization.I was able to enable
cloudpickle
ing a wide variety of Pydantic model definitions with the following two patches:SchemaSerializer
to save the Python arguments necessary to reconstruct it:The
__getattr__
bit is somewhat hacky. This is required becauseSchemaSerializer
does not allow Python classes to subclass it. This can be cleaned up by adding thesubclass
parameter to the PyO3 #[pyclass] macro inpydantic_core
. I’ve tested this as well, with the wrapper looking like:weakref.ref
instead of inheriting from it:AFAICT there’s no downside to this wrapper but it gets around the strange ABC-related pickling error.
@davidhewitt @dmontagu @lig I’m happy to contribute a patch if you think this is a reasonable direction. Let me know what you think. The only downside I can see is that the
SchemaSerializer
wrapper will hold a reference to theschema
andcore_config
objects (though I imagine these are probably already referenced somewhere in one of theBaseModel
orModelMetaclass
members).One low-baggage alternative is to delete the serializer upon serialization and reconstruct it whenever it’s first called. The
SchemaSerializer
’s__reduce__
function could be:Then whenever the
SchemaSerializer
is first called, the Pydantic model can initialize it using the schema and config and cache it.This should only affect users that are serializing the
SchemaSerializer
, and the only added cost is the initialization upon the first call.I did some digging and it looks like
cloudpickle
switches between “by reference” or “by value” pickling modes according to whether your type is importable.So in the repro discussed above, the class is
__main__.SimpleModel
which is treated as not importable. In this case it looks to me likecloudpickle
attempts to recreate a “skeleton” class which functions the same as the provided type. I don’t see a way to customise this behaviour. So to support “by value” pickling we need to support naive pickling for all the attributes of the class, as @shrekris-anyscale says. I can’t see a way to customise this behaviour. Maybe cloudpickle maintainers know of solutions.On the other hand, if
SimpleModel
is moved into a module and imported from there (e.g.from foo import SimpleModel
), thencloudpickle
will use “by reference” pickling. This already works fine (the pickled data just contains the reference to the import path).So @shrekris-anyscale a possible workaround may be to move your model definitions out of
__main__
files / entry points into modules. Without knowing the full details of your application I don’t know if that’s actually viable.