pydantic: Pydantic `__eq__` and `__hash__` doesn't work properly with `functools.cached_property`

Initial Checks

  • I confirm that I’m using Pydantic V2

Description

Bug

While porting an app to Pydantic v2, I notice that the implementation of __eq__ doesn’t work properly on Model with functools.cached_property: equal instances can compare unequal.

From the documentation on computed fields, it seems pydantic should support functools.cached_property. I’m trying to compute an attribute value lazily and only once, because it is very expensive. functools.cached_property seems like the perfect tool for that.

Example 1 below demonstrate the problem, and Example 2 shows that this can introduce subtle bugs when using dict on “equal” objects, as dict relies on a correct __eq__ and __hash__.

Analysis

Line 831 from pydantic’s implementation of __eq__ seems to assumes that __dict__ only ever contains pydantic’s field values. https://github.com/pydantic/pydantic/blob/c6e2f89f36805aef03194c366afb30faa798d188/pydantic/main.py#L829-L834

I think this is not true in the general case, as __dict__ is an instance state that is shared between pydantic and all other mechanism that need to act on the instance itself. Indeed, functools.cached_property stores the computed value in the instance __dict__, because it is the only place it can reasonably store it:

https://github.com/python/cpython/blob/1f885df2a580360c5de69cc41191f3c6bfaaeb35/Lib/functools.py#L1005

Assuming that __dict__ only contains pydantic’s own fields has already been mentionned in #6422

Possible fix

I think the fix below limits the comparison of __dict__ to pydantic’s own fields. I don’t know much about pydantic source code, so there might be some rough edges I overlooked. I used a sentinel object as a default value to avoid crashing on missing fields, not sure it is actually necessary.


    def __eq__(self, other):
        ...
        sentinel = object()
        return (
            self_type == other_type
            and all(
                getattr(self, field, sentinel) == getattr(other, field, sentinel)
                for field in self.model_fields
            )
            and self.__pydantic_private__ == other.__pydantic_private__
            and self.__pydantic_extra__ == other.__pydantic_extra__
        )

Related issues

  • #6422 had a more obscure problem due to assuming __dict__ contains only pydantic fields

Additional context

Comparing the full __dict__ seems to be a change from pydantic V1. I would be interested in knowing why the full __dict__ is compared in V2.

Both attrs and the stdlib dataclasses work seamlessly with functools.cahced_property and only compare their own fields:

dataclasses __eq__

https://github.com/python/cpython/blob/1f885df2a580360c5de69cc41191f3c6bfaaeb35/Lib/dataclasses.py#L1086-L1099

attrs __eq__

https://www.attrs.org/en/stable/comparison.html#comparison

Example Code

# ---------- Example 1 -------------
import functools

import pydantic
    

class Model(pydantic.BaseModel):
    attr: int

    @functools.cached_property
    def cached(self) -> int:
        return 0


obj1 = Model(attr=1)
obj2 = Model(attr=1)

assert obj1 == obj2  # works

obj1.cached

assert obj1 == obj2  # raises an error

# ---------- Example 2 -------------

import functools

import pydantic


class Model(pydantic.BaseModel):
    model_config = pydantic.ConfigDict(frozen=True)

    attr: int

    @functools.cached_property
    def cached(self) -> int:
        return 0




obj1 = Model(attr=1)
obj2 = Model(attr=1)
d = {obj1: True}

assert d[obj2]

obj1.cached

assert d[obj2]  # raise an error

Python, Pydantic & OS Version

Note: installed pydantic with "pip install git+https://github.com/pydantic/pydantic.git"


             pydantic version: 2.3.0
        pydantic-core version: 2.7.0
          pydantic-core build: profile=release pgo=false
                 install path: ~/miniconda3/envs/dev-sandbox/lib/python3.8/site-packages/pydantic
               python version: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:11:34)  [Clang 14.0.6 ]
                     platform: macOS-10.16-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Reactions: 3
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

🎆 🎉 💃

@alexmojaki I’ve reviewed both, they basically seem good — thank you both for the detailed performance investigations and nuanced, correct(🤞) implementations.

I had note on #7786 that doesn’t necessarily need to be addressed before merging (my most recent comment), and I think that can be merged, and then I think #7825 can be merged once @alexmojaki’s comments on the PR are addressed.

I’ll leave it to @sydney-runkle to decide exactly when to merge, I think the plan was docs only changes prior to 2.5.0, but given it’s a bugfix maybe it can be in a quick 2.5.1.

I’ll take some time to experiment with __eq__ implementations, yes !

I personally think it makes sense to just replace self.__dict__ == other.__dict__ in the BaseModel.__eq__ with the relevant check for only fields. (i.e., {k: self.__dict__.get(k) for k in self.model_fields} == {k: other.__dict__.get(k) for k in other.model_fields}, or perhaps something fancier/faster using operator.itemgetter as in @alexmojaki’s #7786).

Even if it’s not quite as fast, I agree with the claims made here that @cached_property should behave as closely to @property as possible and not cause instances to not compare equal, same as the various other libraries referenced in the issue report.

I expect @cached_property to behave the same as @property, but with performance improvements. Pydantic doesn’t compare @property, I think therefore it shouldn’t compare @cached_property.

I think this is an excellent point. Put differently, if @cached_property somehow worked another way and didn’t affect __dict__ and thus __eq__ then I don’t think it’d be considered a bug that __eq__ returns True even if a cached property has only been accessed on one side. If that would somehow be considered a bug, then it’s a bug right now with @property.

The reason we used __dict__ was that it’s faster than iterating over model fields.

How about replacing:

self.__dict__ == other.__dict__ 

with:

(
    self.__dict__ == other.__dict__ or (
        self.__dict__.keys() != other.__dict__.keys() and
        (slower check using model_fields or something)
    )
)

This should be:

  1. Essentially the same speed when the dicts are equal
  2. Still pretty good when the dicts are not equal but the keys are equal, since I imagine self.__dict__ == other.__dict__ should often be faster than self.__dict__.keys() != other.__dict__.keys() since it doesn’t have to compare values, although that depends on several things.
  3. Only triggering the slow check in rare cases like this one, since we’ve already checked self_type == other_type and instances of the same type will usually have the same .__dict__.keys().

Moving the whole check to after checking __pydantic_private__ and __pydantic_extra__ might also help.