pydantic: Pydantic `__eq__` and `__hash__` doesn't work properly with `functools.cached_property`
Initial Checks
- I confirm that I’m using Pydantic V2
Description
Bug
While porting an app to Pydantic v2, I notice that the implementation of __eq__
doesn’t work properly on Model with functools.cached_property
: equal instances can compare unequal.
From the documentation on computed fields, it seems pydantic should support functools.cached_property
. I’m trying to compute an attribute value lazily and only once, because it is very expensive. functools.cached_property
seems like the perfect tool for that.
Example 1 below demonstrate the problem, and Example 2 shows that this can introduce subtle bugs when using dict on “equal” objects, as dict relies on a correct __eq__
and __hash__
.
Analysis
Line 831 from pydantic’s implementation of __eq__
seems to assumes that __dict__
only ever contains pydantic’s field values.
https://github.com/pydantic/pydantic/blob/c6e2f89f36805aef03194c366afb30faa798d188/pydantic/main.py#L829-L834
I think this is not true in the general case, as __dict__
is an instance state that is shared between pydantic and all other mechanism that need to act on the instance itself.
Indeed, functools.cached_property
stores the computed value in the instance __dict__
, because it is the only place it can reasonably store it:
Assuming that __dict__
only contains pydantic’s own fields has already been mentionned in #6422
Possible fix
I think the fix below limits the comparison of __dict__
to pydantic’s own fields. I don’t know much about pydantic source code, so there might be some rough edges I overlooked. I used a sentinel object as a default value to avoid crashing on missing fields, not sure it is actually necessary.
def __eq__(self, other):
...
sentinel = object()
return (
self_type == other_type
and all(
getattr(self, field, sentinel) == getattr(other, field, sentinel)
for field in self.model_fields
)
and self.__pydantic_private__ == other.__pydantic_private__
and self.__pydantic_extra__ == other.__pydantic_extra__
)
Related issues
- #6422 had a more obscure problem due to assuming
__dict__
contains only pydantic fields
Additional context
Comparing the full __dict__
seems to be a change from pydantic V1. I would be interested in knowing why the full __dict__
is compared in V2.
Both attrs
and the stdlib dataclasses
work seamlessly with functools.cahced_property
and only compare their own fields:
dataclasses __eq__
attrs __eq__
https://www.attrs.org/en/stable/comparison.html#comparison
Example Code
# ---------- Example 1 -------------
import functools
import pydantic
class Model(pydantic.BaseModel):
attr: int
@functools.cached_property
def cached(self) -> int:
return 0
obj1 = Model(attr=1)
obj2 = Model(attr=1)
assert obj1 == obj2 # works
obj1.cached
assert obj1 == obj2 # raises an error
# ---------- Example 2 -------------
import functools
import pydantic
class Model(pydantic.BaseModel):
model_config = pydantic.ConfigDict(frozen=True)
attr: int
@functools.cached_property
def cached(self) -> int:
return 0
obj1 = Model(attr=1)
obj2 = Model(attr=1)
d = {obj1: True}
assert d[obj2]
obj1.cached
assert d[obj2] # raise an error
Python, Pydantic & OS Version
Note: installed pydantic with "pip install git+https://github.com/pydantic/pydantic.git"
pydantic version: 2.3.0
pydantic-core version: 2.7.0
pydantic-core build: profile=release pgo=false
install path: ~/miniconda3/envs/dev-sandbox/lib/python3.8/site-packages/pydantic
python version: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:11:34) [Clang 14.0.6 ]
platform: macOS-10.16-x86_64-i386-64bit
optional deps. installed: ['typing-extensions']
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Reactions: 3
- Comments: 19 (19 by maintainers)
Commits related to this issue
- Add failing test case for #7444 — committed to QuentinSoubeyranAqemia/pydantic by QuentinSoubeyranAqemia 9 months ago
- fix #7444 with implementation benchmark — committed to QuentinSoubeyranAqemia/pydantic by QuentinSoubeyranAqemia 8 months ago
- Merge branch 'main' into gh-7444 — committed to QuentinSoubeyranAqemia/pydantic by QuentinSoubeyranAqemia 8 months ago
- fix #7444 with implementation benchmark — committed to QuentinSoubeyranAqemia/pydantic by QuentinSoubeyranAqemia 8 months ago
🎆 🎉 💃
@alexmojaki I’ve reviewed both, they basically seem good — thank you both for the detailed performance investigations and nuanced, correct(🤞) implementations.
I had note on #7786 that doesn’t necessarily need to be addressed before merging (my most recent comment), and I think that can be merged, and then I think #7825 can be merged once @alexmojaki’s comments on the PR are addressed.
I’ll leave it to @sydney-runkle to decide exactly when to merge, I think the plan was docs only changes prior to 2.5.0, but given it’s a bugfix maybe it can be in a quick 2.5.1.
I’ll take some time to experiment with
__eq__
implementations, yes !I personally think it makes sense to just replace
self.__dict__ == other.__dict__
in theBaseModel.__eq__
with the relevant check for only fields. (i.e.,{k: self.__dict__.get(k) for k in self.model_fields} == {k: other.__dict__.get(k) for k in other.model_fields}
, or perhaps something fancier/faster usingoperator.itemgetter
as in @alexmojaki’s #7786).Even if it’s not quite as fast, I agree with the claims made here that
@cached_property
should behave as closely to@property
as possible and not cause instances to not compare equal, same as the various other libraries referenced in the issue report.I think this is an excellent point. Put differently, if
@cached_property
somehow worked another way and didn’t affect__dict__
and thus__eq__
then I don’t think it’d be considered a bug that__eq__
returns True even if a cached property has only been accessed on one side. If that would somehow be considered a bug, then it’s a bug right now with@property
.How about replacing:
with:
This should be:
self.__dict__ == other.__dict__
should often be faster thanself.__dict__.keys() != other.__dict__.keys()
since it doesn’t have to compare values, although that depends on several things.self_type == other_type
and instances of the same type will usually have the same.__dict__.keys()
.Moving the whole check to after checking
__pydantic_private__
and__pydantic_extra__
might also help.