array-api: linalg.solve broadcasting behavior is ambiguous
The spec for the linalg.solve function seems ambiguous. In solve(x1, x2), x1 has shape (..., M, M) and x2 either has shape (..., M) or (..., M, K). In either case, the ... parts should be broadcast compatible.
This is ambiguous. For example, if x1 is shape (2, 2, 2) and x2 is shape (2, 2), should this be interpreted as x2 is (2,) stack of a (2,) vector, i.e., the result would be (2, 2, 2, 1) after broadcasting, or as a single stack of a 2x2 matrix, i.e., resulting in (2, 2, 2, 2).
- Relevant pytorch issue about this: https://github.com/pytorch/pytorch/issues/52915
- Relevant NumPy issue: https://github.com/numpy/numpy/issues/15349
torch.linalg.solvedocs: https://pytorch.org/docs/stable/generated/torch.linalg.solve.htmlnumpy.linalg.solvedocs: https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve
Regarding NumPy, it seems to sometimes pick one over the other, even when only the other one makes sense. For example
>>> x1 = np.eye(1)
>>> x2 = np.asarray([[0.], [0.]])
>>> x1.shape
(1, 1)
>>> x2.shape
(2, 1)
>>> np.linalg.solve(x1, x2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in solve
File "/Users/aaronmeurer/anaconda3/envs/array-apis/lib/python3.9/site-packages/numpy/linalg/linalg.py", line 393, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
ValueError: solve: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m,n)->(m,n) (size 2 is different from 1)
Here it wants to treat x2 as a single 2x1 matrix, which is shape incompatible with the 1x1 x1, but it could also treat it a (2,) stacks of length 1 vectors.
I think there are also some issues with the way the spec describes broadcasting. It says “shape(x2)[:-1] must be compatible with shape(x1)[:-1]” but I think this should be shape(x2)[:-2] and so on, since matrix dimensions should never broadcast with each other. It also says that the output should always have same shape as x2, which contradicts that the inputs should broadcast together.
If I am reading the pytorch docs correctly, it resolves this by only allowing broadcasting in the case where x2 is exactly 1- or 2-dimensional. Otherwise when x2 is a stack of matrices, the stack part of the shape has to match the stack part of shape(x1) exactly.
However, I think this still is ambiguous in the case I noted above where x1 is (2, 2, 2) and x2 is (2, 2). x2 could be a matrix, which would broadcast, or a stack of a (2,) matrix, which has a matching stack shape as x1.
So I think more is required to disambiguate, e.g., only allow broadcasting for matrices and not for vectors. One could also remove the vector case completely, or only allow it in the sample case of x2 being 1-D (i.e., no stacks of 1-D vectors).
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 16 (11 by maintainers)
Commits related to this issue
- Add WIP strategy for test_solve() linalg.solve is presently ambiguous in its definition. See https://github.com/data-apis/array-api/issues/285. So for now we are not implementing the test, as it is n... — committed to data-apis/array-api-tests by asmeurer 3 years ago
- Workaround np.linalg.solve ambiguity NumPy's solve() does not handle the ambiguity around x2 being 1-D vector vs. an n-D stack of matrices in the way that the standard specifies. Namely, x2 should be... — committed to asmeurer/array-api-compat by asmeurer 4 months ago
- API: Remove broadcasting ambiguity from np.linalg.solve Previously the np.linalg.solve documentation stated: a : (..., M, M) array_like Coefficient matrix. b : {(..., M,), (..., M, K)}, array_li... — committed to asmeurer/numpy by asmeurer 4 months ago
I think what you are missing is that the stack part of the shapes should broadcast with one another (only the stack part, the “matrix” parts should never broadcast, even if they are size 1). A single (N, M) matrix can be thought of as an array with “stack shape”
(), that is a single 0-dimensional stack. This () stack shape is broadcast compatible with any other stack shape.The whole idea with stacks of matrices is that we think of the last two dimensions of the array as single entries of matrices in a larger array. So a (2, 2, 3, 3) array is like a (2, 2) array of 3x3 matrices. If we had a (2, 2) array of numbers and a 0-dimensional () array (scalar), they would broadcast together. Similarly, a (2, 2, 3, 3) stack of matrices should combine with a single (3, 4) matrix (the only additional restriction being that the matrix shapes need to align).
This is how the matmul operator works, as @IvanYashchuk pointed out. For example, these all work:
In each case, the matrix part of the shape is the last two dimensions, which align for the matrix product (3x4 and 4x5). The stack parts are
which are broadcast compatible. The resulting shape for the first two is (2, 3, 5) and for the last one (2, 2, 3, 5).
I support the decision to allow broadcasting only for matrices and not for vectors. I don’t think we need to remove the vector case completely (we need to remove stacks of 1-D vectors).
Let’s think of
linalg.solve(x1, x2)function as an efficient shortcut forlinalg.inv(x1) @ x2.linalg.invpreserves the shape of the input, so the answer to the question of what shapes and broadcasting should be allowed for the solve function could be “do the same as@/__matmul__allows”. Also I think the following comparison should run without errors:x1 @ np.linalg.solve(x1, x2) == x2. Currently, it would fail for x1 with shape (2, 3, 3) and x2 with shape (2, 3) even thoughnp.linalg.solve(x1, x2)runs fine. It leads to the suggestion that a stack of 1-D vectors should not be allowed.@IvanYashchuk As I mentioned in yesterday’s call this is not the case. There are ways to make only 1 call, which is an implementation detail, so this alone is not enough to justify the API design decision we agreed upon.
Please also adjust the docstring of linalg.solve accordingly.
Should be replaced by something like:
Return the solution to the linear matrix equation `AX = B` with full rank `A`.Talking about a system of linear equations is highly misleading, as people stated previously, one should think of
linalg.inv(x1) @ x2instead. Or maybe write that explicitly?Can we also put some help, how to solve a system of linear equations? Is the right way to use:
or do I need some fancy broadcasting and moving of axes, like
to benefit from the batched LAPACK calls?
Sorry for all the ruckus, but solving systems of linear equations is one of the most common tasks for me, and of course I want my functions to behave like gu-functions.
@leofang Already on today’s agenda. 😅
I’d argue that supporting
(..., M, K)has been one of the poor, legacy design decisions made in NumPy. Since we’ve been trying to avoid some of such decisions in the spec, we might as well follow the same route here and not support it.My argument is
Kinto the stack dimensions and let the solver internally broadcast it for us; this should be possible (with well-defined semantics):x1with shape(..., M, M)andx2with shape(M,)x1with shape(..., M, M)andx2with shape(..., M)x1with shape(..., M, M)andx2with shape(..., K, M)forKcolumns: Internally the solver can reshapex1into(..., 1, M, M)and then it’d be broadcastableIf we had more time I’d prefer us to get other library developers involved and/or do a downstream survey… but considering the approaching deadline how about we discuss and decide in today’s call @kgryte?
Thank you, Aaron! Indeed, I missed the broadcasting behavior, and always calculated the batch array from
x1.shape[:-2]to determine how to interpretx2. But, from my mistake we can see a possibility to remove the ambiguity.My preference would be also to “disable a stack of 1D vectors”, but to do it the other way around: disable the possibility of
x2having shape of(..., M, K). The factor K there is never intuitive to users (not to me, at least), especially when we also accept prepended batch dimensions “…” forx2. If we want batches, we should always make it prepended. This also allows a unique broadcasting behavior for “a stack of 1D vectors”.