attrs: `cmp`ing fails when attrib is a numpy array

The generated cmp methods fail if the attrib is a numpy array (of size >1) with

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

this is because:

# equivalent to how __eq__compares fields:
(np.array([47,47],) == (np.array([47,47],) # ValueError
# vs:
(np.array([47,47]) == np.array([47,47])).all() # True

I realize that this can be switched off with cmp=False, but often comparing these attribs is useful!

reproducible example:

@attr.s(auto_attribs=True)
class A:
    arr: np.ndarray

foo = A(np.array([47,47]))
bar = A(np.array([47,47]))

foo == bar # ValueError

the simplest fix might be to trust user annotations about what is/isn’t a numpy array and check those separately. or, maybe better, would be awesome to supply a custom cmp function, e.g.

@attr.s
class A:
    arr: np.ndarray = attr.ib(cmpf=lambda f,x,y: f(x,y).all())

where f is the standard cmp function (eq, lt, etc) and x,y are the items being compared.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 24 (12 by maintainers)

Commits related to this issue

Most upvoted comments

Fixed by #627 – tell your Numpy friends. 😃

I’ve found this wildly frustrating myself, but I think this is a problem with numpy breaking normal __eq__ semantics, and attrs actually shouldn’t try and solve this. Normally the signature of __eq__ should match object.__eq__'s signature but numpy goes its own way:

# object
def __eq__(self, other: object) -> bool: ...

# numpy
def __eq__(self, other: np.ndarray) -> np.ndarray: ...

This is so you can get the equality product of two arrays easily. And every data scientist I talk to seems to think this is the most natural thing in the world. I deeply disagree with this design decision, but I think the reasoning behind this is clear: in numpy, everything operates on arrays, normal python language semantics be damned!

So in order to solve this issue in attrs, you’d need to have a flexible way of comparing equality on an item-per-item basis…except we already have a good way to do this, and it’s the __eq__ methods, which numpy abuses to satisfy its matrix manipulation mania. This allows attrs to leverage tuple.__eq__ to dynamically write obvious code for equality. Making this overridable with some kindof alternative cmp_func attribute or some other solution like that would be overkill and would break the simplicity of the current __eq__ implementation for little gain.

But there are realistically a few ways of handling this:

  1. wrap your ndarray, then you can use it normally in other attrs classes:
@attr.s(cmp=False, slots=True)
class Arr:
    arr = attr.ib(type=np.ndarray)
    def __eq__(self, other: object) -> bool:
        if type(self) is not type(other):
            return NotImplemented
        return (self.arr == self.other).all()

@attr.s
class Bitmap:
    raw = attr.ib(type=Arr)
    colorspace = attr.ib(type=str)

Bitmap(Arr(np.array([1])), 'srgb') == Bitmap(Arr(np.array([1])), 'srgb')
  1. Use fields + metadata to generate a custom eq method
import operator

import attr
import numpy as np

@attr.s(cmp=False)
class CustomEq:
    label = attr.ib(type=str)
    tags = attr.ib(type=set)
    data = attr.ib(type=np.ndarray, metadata={'eq': np.array_equal})

    def __eq__(self, other):
        if type(self) is not type(other):
            return NotImplemented
        eq_spec = ((f.name, f.metadata.get('eq', operator.eq))
                   for f in attr.fields(type(self)))
        return all(eq(getattr(self, name), getattr(other, name))
                   for name, eq in eq_spec)

Example 2 is more like what I think attrs would have to do (or provide the option of doing) to make these things work in a more general case (with maybe a cmp_hooks=True argument to attr.s), but I think getting that actually right is more tricky than that short example I showed. I don’t think I’ve considered all the angles on that, and it’s a lot of work for a specific edge case.

That said, calling numpy an edge case is a bit silly at this point, since the science wing of python is an enormous and important part of the community. But that’s why I think option 1 makes the most sense. There have to be meaningful ways to drag their types into the normal python system, and light wrappers make a lot more sense than anything else I’ve come up with. In fact, you get the added benefit of a place to provide semantic information about things like numpy arrays, which are normally paraded around as naked data structures that reveal nothing of their intent.

I ran into many of these issues at my job and in writing+using zerial (sorry no docs yet), where I’m trying an approach to serialization somewhere between related and cattrs. And I’ve found that numpy really often doesn’t play well with the rest of python. But for an entire class of users, it is the very reason they use python and not something else. So we’ll have to figure out a way to do deal with it, and I think wrapping numpy arrays might be the way.

Since this is the second time numpy equality brought up (ref #409), I think it’s best to keep it open. 😃

As a workaround, you can totally write your own __cmp__ by using @attr.s(cmp=False) but then you’ll have to implement all of them.

We’ll have to come up with a nicer way!