array-api: linalg.solve broadcasting behavior is ambiguous

The spec for the linalg.solve function seems ambiguous. In solve(x1, x2), x1 has shape (..., M, M) and x2 either has shape (..., M) or (..., M, K). In either case, the ... parts should be broadcast compatible.

This is ambiguous. For example, if x1 is shape (2, 2, 2) and x2 is shape (2, 2), should this be interpreted as x2 is (2,) stack of a (2,) vector, i.e., the result would be (2, 2, 2, 1) after broadcasting, or as a single stack of a 2x2 matrix, i.e., resulting in (2, 2, 2, 2).

Regarding NumPy, it seems to sometimes pick one over the other, even when only the other one makes sense. For example

>>> x1 = np.eye(1)
>>> x2 = np.asarray([[0.], [0.]])
>>> x1.shape
(1, 1)
>>> x2.shape
(2, 1)
>>> np.linalg.solve(x1, x2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in solve
  File "/Users/aaronmeurer/anaconda3/envs/array-apis/lib/python3.9/site-packages/numpy/linalg/linalg.py", line 393, in solve
    r = gufunc(a, b, signature=signature, extobj=extobj)
ValueError: solve: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m,n)->(m,n) (size 2 is different from 1)

Here it wants to treat x2 as a single 2x1 matrix, which is shape incompatible with the 1x1 x1, but it could also treat it a (2,) stacks of length 1 vectors.

I think there are also some issues with the way the spec describes broadcasting. It says “shape(x2)[:-1] must be compatible with shape(x1)[:-1]” but I think this should be shape(x2)[:-2] and so on, since matrix dimensions should never broadcast with each other. It also says that the output should always have same shape as x2, which contradicts that the inputs should broadcast together.

If I am reading the pytorch docs correctly, it resolves this by only allowing broadcasting in the case where x2 is exactly 1- or 2-dimensional. Otherwise when x2 is a stack of matrices, the stack part of the shape has to match the stack part of shape(x1) exactly.

However, I think this still is ambiguous in the case I noted above where x1 is (2, 2, 2) and x2 is (2, 2). x2 could be a matrix, which would broadcast, or a stack of a (2,) matrix, which has a matching stack shape as x1.

So I think more is required to disambiguate, e.g., only allow broadcasting for matrices and not for vectors. One could also remove the vector case completely, or only allow it in the sample case of x2 being 1-D (i.e., no stacks of 1-D vectors).

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 16 (11 by maintainers)

Commits related to this issue

Most upvoted comments

I think what you are missing is that the stack part of the shapes should broadcast with one another (only the stack part, the “matrix” parts should never broadcast, even if they are size 1). A single (N, M) matrix can be thought of as an array with “stack shape” (), that is a single 0-dimensional stack. This () stack shape is broadcast compatible with any other stack shape.

The whole idea with stacks of matrices is that we think of the last two dimensions of the array as single entries of matrices in a larger array. So a (2, 2, 3, 3) array is like a (2, 2) array of 3x3 matrices. If we had a (2, 2) array of numbers and a 0-dimensional () array (scalar), they would broadcast together. Similarly, a (2, 2, 3, 3) stack of matrices should combine with a single (3, 4) matrix (the only additional restriction being that the matrix shapes need to align).

This is how the matmul operator works, as @IvanYashchuk pointed out. For example, these all work:

np.ones((2, 3, 4)) @ np.ones((4, 5))
np.ones((2, 3, 4)) @ np.ones((1, 4, 5))
np.ones((2, 3, 4)) @ np.ones((2, 2, 4, 5))

In each case, the matrix part of the shape is the last two dimensions, which align for the matrix product (3x4 and 4x5). The stack parts are

(2,), ()
(2,), (1,)
(2,), (2, 2)

which are broadcast compatible. The resulting shape for the first two is (2, 3, 5) and for the last one (2, 2, 3, 5).

I support the decision to allow broadcasting only for matrices and not for vectors. I don’t think we need to remove the vector case completely (we need to remove stacks of 1-D vectors).

Let’s think of linalg.solve(x1, x2) function as an efficient shortcut for linalg.inv(x1) @ x2. linalg.inv preserves the shape of the input, so the answer to the question of what shapes and broadcasting should be allowed for the solve function could be “do the same as @/__matmul__ allows”. Also I think the following comparison should run without errors: x1 @ np.linalg.solve(x1, x2) == x2. Currently, it would fail for x1 with shape (2, 3, 3) and x2 with shape (2, 3) even though np.linalg.solve(x1, x2) runs fine. It leads to the suggestion that a stack of 1-D vectors should not be allowed.

  1. x1 with shape (M, M) and x2 is a stack of vectors with shape (1000, M); it is 1000 calls to LAPACK API if no reshaping of x2 to (M, 1000) is done.

@IvanYashchuk As I mentioned in yesterday’s call this is not the case. There are ways to make only 1 call, which is an implementation detail, so this alone is not enough to justify the API design decision we agreed upon.

Please also adjust the docstring of linalg.solve accordingly.

Returns the solution to the system of linear equations represented by the well-determined (i.e., full rank) linear matrix equation AX = B .

Should be replaced by something like: Return the solution to the linear matrix equation `AX = B` with full rank `A`.

Talking about a system of linear equations is highly misleading, as people stated previously, one should think of linalg.inv(x1) @ x2 instead. Or maybe write that explicitly?

Can we also put some help, how to solve a system of linear equations? Is the right way to use:

x = linalg.solve(A, b[..., None])[..., 0]

or do I need some fancy broadcasting and moving of axes, like

shape = b.shape
A, b = np.broadcast_arrays(A, b[..., None])
b = b[...,0].reshape(-1, shape[-1]).T
x = linalg.solve(A, b).T.reshape(shape)

to benefit from the batched LAPACK calls?

Sorry for all the ruckus, but solving systems of linear equations is one of the most common tasks for me, and of course I want my functions to behave like gu-functions.

@leofang Already on today’s agenda. 😅

I’d argue that supporting (..., M, K) has been one of the poor, legacy design decisions made in NumPy. Since we’ve been trying to avoid some of such decisions in the spec, we might as well follow the same route here and not support it.

My argument is

  1. I don’t see why we can’t place K into the stack dimensions and let the solver internally broadcast it for us; this should be possible (with well-defined semantics):
    • x1 with shape (..., M, M) and x2 with shape (M,)
    • x1 with shape (..., M, M) and x2 with shape (..., M)
    • x1 with shape (..., M, M) and x2 with shape (..., K, M) for K columns: Internally the solver can reshape x1 into (..., 1, M, M) and then it’d be broadcastable
  2. Mathematically it’s unintuitive as for what’s going on in terms of tensor contraction, like @DerWeh rightfully pointed out

If we had more time I’d prefer us to get other library developers involved and/or do a downstream survey… but considering the approaching deadline how about we discuss and decide in today’s call @kgryte?

Thank you, Aaron! Indeed, I missed the broadcasting behavior, and always calculated the batch array from x1.shape[:-2] to determine how to interpret x2. But, from my mistake we can see a possibility to remove the ambiguity.

My preference would be also to “disable a stack of 1D vectors”, but to do it the other way around: disable the possibility of x2 having shape of (..., M, K). The factor K there is never intuitive to users (not to me, at least), especially when we also accept prepended batch dimensions “…” for x2. If we want batches, we should always make it prepended. This also allows a unique broadcasting behavior for “a stack of 1D vectors”.