tensorflow: Extremely slow eigendecomposition compared to numpy/scipy.
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.15.6
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): v2.3.0-rc2-23-gb36436b087 2.3.0
- Python version: 3.7.7
- Bazel version (if compiling from source): N/A
- GCC/Compiler version (if compiling from source): N/A
- CUDA/cuDNN version: N/A
- GPU model and memory: N/A
I am using eigendecomposition in Tensorflow and find that it is extremely slow. This is on a mac, so these are CPU computations. But I’ve also done this on a linux box with a GPU and found the same thing. Here’s the code to show Tensorflow’s speed vs numpy and scipy:
import numpy as np
import scipy as sp
import tensorflow as tf
from time import time
A = np.random.randn(400, 400)
A_tf = tf.constant(A)
cur = time()
d, v = sp.linalg.eig(A)
print(f'sp: {time() - cur:4.2f} s')
cur = time()
d, v = np.linalg.eig(A)
print(f'np: {time() - cur:4.2f} s')
cur = time()
d, v = tf.linalg.eig(A_tf)
print(f'tf: {time() - cur:4.2f} s')
This gives the following output:
sp: 0.09 s
np: 0.08 s
tf: 5.04 s
Any ideas of what’s up here?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 23 (12 by maintainers)
@refraction-ray This solution solved the problem. I achieved about a 20x speed up on my MacBook for
N=500. Withtf.linalg.eig, 25 iterations of ADAM took ~300 s. Withpy_functionwrappingnp.linalg.eig, 25 iterations took ~14 s.I thought there might be a solution like this, but I couldn’t find any help online about how to use custom gradients with
py_function.Thank you tremendously for your help here!!
This is ultimately due to single threaded Eigen implementations for eig op which could be linked to multithreaded MKL but actually not in tf bazel setup. There are many similar problems complaining the speed as https://github.com/tensorflow/tensorflow/issues/7128, https://github.com/tensorflow/tensorflow/issues/13222, etc., and this problem can only be fully addressed and solved by https://github.com/tensorflow/tensorflow/issues/34924. Namely, by supporting MKL linkage for eigen when compiling tensorflow, but as described in the above issue, I am not fully clear how to make such setup work.
For now there is a workaround using
tf.py_functionthough, which can utilizeeigfrom numpy or scipy in forward pass (which is ultimately provided by multithreded MKL or openblas) and enjoy automatic differentiation at the same time. See a full demo below (the gradient code part is directly copied from tf codebase):Such an approach provide speed similar to numpy eig and compatible with tf’s AD infrastructure. The 400*400 case with gradient calculation requires time around 0.5s while a pure
tf.eigfoward pass with back propagation needs 8.5s.Of course, this is not a perfect solution as
py_functionhas many limitations such as it cannot be serialized, but I guess it is ok for most research cases when one just plays some small things with python.