pydantic: Unable to `cloudpickle` Pydantic model classes
Initial Checks
- I confirm that I’m using Pydantic V2
Description
cloudpickle cannot serialize Pydantic model classes. It fails with a TypeError: cannot pickle 'pydantic_core._pydantic_core.SchemaSerializer' object exception.
Example Code
# bug.py
"""Cloudpickling Pydantic models raises an exception."""
from pydantic import BaseModel
import ray.cloudpickle as cloudpickle
class SimpleModel(BaseModel):
    val: int
cloudpickle.dumps(SimpleModel)
"""
Output:
% python bug.py
Traceback (most recent call last):
  File "/Users/shrekris/Desktop/scratch/dump4.py", line 9, in <module>
    cloudpickle.dumps(SimpleModel)
  File "/Users/shrekris/Desktop/ray/python/ray/cloudpickle/cloudpickle_fast.py", line 88, in dumps
    cp.dump(obj)
  File "/Users/shrekris/Desktop/ray/python/ray/cloudpickle/cloudpickle_fast.py", line 733, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'pydantic_core._pydantic_core.SchemaSerializer' object
"""
Python, Pydantic & OS Version
/Users/shrekris/miniforge3/envs/pydantic-fix/lib/python3.9/site-packages/pydantic/_migration.py:275: UserWarning: `pydantic.utils:version_info` has been moved to `pydantic.version:version_info`.
  warnings.warn(f'`{import_path}` has been moved to `{new_location}`.')
             pydantic version: 2.0.3
        pydantic-core version: 2.3.0 release build profile
                 install path: /Users/shrekris/miniforge3/envs/pydantic-fix/lib/python3.9/site-packages/pydantic
               python version: 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:38:11)  [Clang 14.0.6 ]
                     platform: macOS-11.4-arm64-arm-64bit
     optional deps. installed: ['typing-extensions']
Selected Assignee: @lig
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 31 (18 by maintainers)
I believe we are going to release 2.5 beta today, with a view to a final 2.5 production release next week.
@edoakes sorry for the delay I was just able to test the updated packages, and I can confirm it worked perfectly for my use case 😃 Thanks a lot!
I don’t think
SchemaSerializerneeds to be picklable, you should just store the core scheme and recreate the serializer.Reading from https://github.com/pydantic/pydantic/issues/8028 there seems to be a couple of items still pending before 2.5 comes out
@davidhewitt any update on when the next
pydanticrelease is going to come out? This is a pretty big pain point for our users and I want to make sure we can unpin the dependency soon.No specific ETA, I would assume 2.5.
Thanks for the help @davidhewitt 🚀 do you know when the next release is scheduled and what its version tag will be?
Both PRs linked above are now merged. I’ve begun manually testing and they appear to address the issue for all of my use cases. @jrapin if you could test out your workflow with
pydantic_coreandpydanticinstalled from main that would be helpful.@davidhewitt Please let me know when a
pydantic_corerelease can be done in order to add integration testing & catch the nextpydanticrelease.I have opened PRs for each of the above issues:
I will begin testing that these fixes are comprehensive using locally installed copies. Once these PRs have been merged and a new version of
pydantic_corehas been released, I will add integration/regression tests to thepydanticrepo.Yes exactly, we can release
pydantic_coreshortly after your PR is merged so there’s not a long delay in getting this working onpydanticmain.@davidhewitt I’d prefer to merge the functionality into
pydanticso thatcloudpickle“just works” out of the box and folks don’t have to worry about patching things.Storing the members and defining
__reduce__onSchemaSerializeritself would indeed be preferable, I’m just not sure how to accomplish that using PyO3 (not familiar with the framework). I can try to get it working.So then the plan of action would be:
pydantic_core): MakeSchemaSerializerdirectly cloudpickleable inpydantic_coreby storing references to the constructor arguments.pydantic): Use aWeakrefWrappersimilar to the above rather than the weakref directly.With these two, we should be good to go. Given that these changes would be split across the repos, is there any special versioning story between them? Or does
pydantictreatpydantic_corelike any other Python package dependency? In PR 2 I’ll add a regression test but that will depend on using a version ofpydantic_corethat includes PR 1.I also ran into the above issues:
SchemaSerializernot beingcloudpickleable due to being a native type (written in Rust)._PydanticWeakrefnot beingcloudpickleable due to inheriting fromweakref.ref, which has known issues with serialization.I was able to enable
cloudpickleing a wide variety of Pydantic model definitions with the following two patches:SchemaSerializerto save the Python arguments necessary to reconstruct it:The
__getattr__bit is somewhat hacky. This is required becauseSchemaSerializerdoes not allow Python classes to subclass it. This can be cleaned up by adding thesubclassparameter to the PyO3 #[pyclass] macro inpydantic_core. I’ve tested this as well, with the wrapper looking like:weakref.refinstead of inheriting from it:AFAICT there’s no downside to this wrapper but it gets around the strange ABC-related pickling error.
@davidhewitt @dmontagu @lig I’m happy to contribute a patch if you think this is a reasonable direction. Let me know what you think. The only downside I can see is that the
SchemaSerializerwrapper will hold a reference to theschemaandcore_configobjects (though I imagine these are probably already referenced somewhere in one of theBaseModelorModelMetaclassmembers).One low-baggage alternative is to delete the serializer upon serialization and reconstruct it whenever it’s first called. The
SchemaSerializer’s__reduce__function could be:Then whenever the
SchemaSerializeris first called, the Pydantic model can initialize it using the schema and config and cache it.This should only affect users that are serializing the
SchemaSerializer, and the only added cost is the initialization upon the first call.I did some digging and it looks like
cloudpickleswitches between “by reference” or “by value” pickling modes according to whether your type is importable.So in the repro discussed above, the class is
__main__.SimpleModelwhich is treated as not importable. In this case it looks to me likecloudpickleattempts to recreate a “skeleton” class which functions the same as the provided type. I don’t see a way to customise this behaviour. So to support “by value” pickling we need to support naive pickling for all the attributes of the class, as @shrekris-anyscale says. I can’t see a way to customise this behaviour. Maybe cloudpickle maintainers know of solutions.On the other hand, if
SimpleModelis moved into a module and imported from there (e.g.from foo import SimpleModel), thencloudpicklewill use “by reference” pickling. This already works fine (the pickled data just contains the reference to the import path).So @shrekris-anyscale a possible workaround may be to move your model definitions out of
__main__files / entry points into modules. Without knowing the full details of your application I don’t know if that’s actually viable.