scipy: solve is very slow since #6775
Since the merge of #6775 scipy.linalg.solve
has been terribly slow for anything but NRHS << N
.
This is rather unfortunate because it has left solve
unusable (too slow) for anything but NRHS << N
. The merge done in #6775 was done primarily based on the performance comparison of gesv
and gesvx
for NRHS << N
. Secondly, in the merge there was no reference of the used LA library, MKL, OpenBLAS, ATLAS, etc.?
Here I show a benchmark on both NRHS << N
and NRHS == N
which is not every corner case, but should highlight that gesvx
is only useful for the former (unless precision is of utmost importance). For the case of NRHS > N
the trend follows. I only show performance benchmark for OpenBLAS.
(I also think there is a bug in handling the deprecated flag sym_pos
in the solve
code. I.e. sym_pos
redefines assume_a
after its meaning has been digested.)
Reproducing code example:
This code has also been used to reproduce the following images.
from time import time as T
import numpy as np
import scipy as sp
import scipy.linalg as sl
la = sl.lapack
ITT = 25
N = np.arange(100, 1401, 50)
def performance():
t = np.zeros([len(N), 4])
for i, n in enumerate(N):
print(n)
A = np.random.rand(n, n) * 10.
B = np.random.rand(n, 3) * 10.
C = np.random.rand(n, n) * 10.
t0 = T()
for _ in range(ITT):
la.dgesvx(A, B)
t[i, 0] = (T() - t0) / ITT
t0 = T()
for _ in range(ITT):
la.dgesv(A, B)
t[i, 1] = (T() - t0) / ITT
t0 = T()
for _ in range(ITT):
la.dgesvx(A, C)
t[i, 2] = (T() - t0) / ITT
t0 = T()
for _ in range(ITT):
la.dgesv(A, C)
t[i, 3] = (T() - t0) / ITT
def accuracy():
acc = np.zeros([len(N), 4])
for it in range(ITT):
print(it)
for i, n in enumerate(N):
A = np.random.rand(n, n) * 10.
B = np.random.rand(n, 3) * 10.
C = np.random.rand(n, n) * 10.
x = la.dgesvx(A, B)[7]
acc[i, 0] += ((dot(A, x) - B) ** 2).sum() ** .5
x = la.dgesv(A, B)[2]
acc[i, 1] += ((dot(A, x) - B) ** 2).sum() ** .5
x = la.dgesvx(A, C)[7]
acc[i, 2] += ((dot(A, x) - C) ** 2).sum() ** .5
x = la.dgesv(A, C)[2]
acc[i, 3] += ((dot(A, x) - C) ** 2).sum() ** .5
acc /= ITT
performance()
accuracy()
Error message:
There is no error message, only bad performance.
Scipy/Numpy/Python version information:
scipy: 0.19.1
numpy: 1.13.1
OpenBLAS: 0.2.20
Here I show timings T(gesvx) / T(gesv)
:
NRHS == 3
:
NRHS == N
:
As seen the performance is really bad for NRHS == N
. 80 times slower for matrices N >= 1000
. I think this justifies a couple of changes to solve
:
- Revert default to
gesv
routines, also given the fact thatgesv
was the default routine in previousscipy
releases. - Add an optional flag
refine=False
which determines the use ofgesvx
routines, ifrefine=True
. Whether this is an appropriate name for the flag, I don’t know. If you have another idea, let me know.
If you agree to this I would be happy to contribute a PR. I have implemented a copy of solve
in one of my own projects to keep using some of the functionality available in scipy, but not in numpy. You may see a re-defined solve
here: https://github.com/zerothi/sisl/blob/c04ed44cbaf1ed0d485747045e4bc7cfcf9ecdfa/sisl/_numpy_scipy.py#L86
This also fixes the sym_pos
flag. I.e. sym_pos
has precedence. If you want a PR and you want assume_a
to have precedence, I can do that as well.
To complete the analysis I have also created an accuracy test to compare the accuracy of gesv
and gesvx
, they can be seen below where it is clear that gesvx
is an order of magnitude more precise (given dot
is numerically exact), however the Frobenius norm I would say is showing an adequate precision for this problem.
Yes, this leaves condition numbers and other analysis out of the solve
routine. But currently I cannot use solve
and I suspect others will also be surprised of the inferior performance since 0.19.1 (or was it 0.19.0?). As you can see in my linked solve
I have added an example of how to decide whether refine
is applicable for ones application.
The accuracy is shown in two plots, one with the actual Frobenius norm, and one with the relative difference
NRHS == 3
:
gesv / gesvx
NRHS == N
:
gesv / gesvx
@ilayn, sorry for bashing your PR, your PR is most valuable! I was just surprised when I updated scipy and my code ran 50 times slower!
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 40 (34 by maintainers)
Commits related to this issue
- ENH: re-enabled old sv lapack routine as defaults This fixes the speed regression in #7847. Since the default change of sv to svx the solve routine suffered a huge performance penalty for anything bu... — committed to zerothi/scipy by zerothi 7 years ago
- ENH: packported #7879 to 1.0.x This fixes the speed regression in #7847. Since the default change of sv to svx the solve routine suffered a huge performance penalty for anything but low order NRHS. ... — committed to zerothi/scipy by zerothi 7 years ago
- BUG: linalg: fix speed regression in solve() This fixes the speed regression in gh-7847. Since the default change of sv to svx the solve routine suffered a huge performance penalty for anything but l... — committed to scipy/scipy by zerothi 7 years ago
- BUG: linalg: fix speed regression in solve() This fixes the speed regression in gh-7847. Since the default change of sv to svx the solve routine suffered a huge performance penalty for anything but l... — committed to rgommers/scipy by zerothi 7 years ago
Sounds great! I have a working solution which I will pr Monday or Tuesday.