tensorflow: QR decomposition is slow
We are doing a bunch of of QR decompositions in numpy. I did preliminary investigation of moving them to TF, but TF version is slow compared to numpy.
Below is a benchmark script that runs QR decomposition of 4096x4096 matrix. It took 7.3 seconds in TF and 1.93 in numpy MKL. Numpy MKL is the default numpy that comes when installing Anaconda.
version: HEAD from last week, built with --config=cuda --config=opt
cpu: 32 core Intel® Xeon® CPU E5-2630 v3 @ 2.40GHz
https://github.com/yaroslavvb/stuff/blob/master/tiny_runs/qr_test.py
Note that pip install --upgrade $TF_BINARY_URL will overwrite MKL numpy with OpenBLAS numpy that is actually slower than TF version. The way to check is to look at np.__config__.show() and look for strings like mkl_intel_lp64. You can get MKL version back by uninstalling numpy and doing conda install numpy
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 17 (17 by maintainers)
I’ll working on this!
Oh interesting! PS, I thought latest MKL is 2017 update 2. At least that’s the version I get when I download latest official Intel for Python distribution from Intel.
You can get MKL version using this snippet
Can you also use SVD from MKL in TF? The MKL version seems to be pretty efficient and the TF version had some correctness issues – https://github.com/tensorflow/tensorflow/issues/8905
https://github.com/yaroslavvb/stuff/blob/master/svd_benchmark.py
@yaroslavvb , I both installed mkl_dnn 2018 and full mkl 2018, add
to
tensorflow/core/kernels/qr_op_impl.h, runbazel build --config=opt --config=cuda --define=using_mkl=true //tensorflow/tools/pip_package:build_pip_packageto build, got