tensorflow: QR decomposition is slow

We are doing a bunch of of QR decompositions in numpy. I did preliminary investigation of moving them to TF, but TF version is slow compared to numpy.

Below is a benchmark script that runs QR decomposition of 4096x4096 matrix. It took 7.3 seconds in TF and 1.93 in numpy MKL. Numpy MKL is the default numpy that comes when installing Anaconda.

version: HEAD from last week, built with --config=cuda --config=opt cpu: 32 core Intel® Xeon® CPU E5-2630 v3 @ 2.40GHz https://github.com/yaroslavvb/stuff/blob/master/tiny_runs/qr_test.py

Note that pip install --upgrade $TF_BINARY_URL will overwrite MKL numpy with OpenBLAS numpy that is actually slower than TF version. The way to check is to look at np.__config__.show() and look for strings like mkl_intel_lp64. You can get MKL version back by uninstalling numpy and doing conda install numpy

@rmlarsen

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 2
  • Comments: 17 (17 by maintainers)

Most upvoted comments

I’ll working on this!

Oh interesting! PS, I thought latest MKL is 2017 update 2. At least that’s the version I get when I download latest official Intel for Python distribution from Intel.

You can get MKL version using this snippet

import ctypes
import numpy as np

def mklVersion():
    ver = np.zeros(199, dtype=np.uint8)
    mkl = ctypes.cdll.LoadLibrary("libmkl_rt.so")
    mkl.MKL_Get_Version_String(ver.ctypes.data_as(ctypes.c_char_p), 198)
    return ver[ver != 0].tostring()

mklVersion()

Can you also use SVD from MKL in TF? The MKL version seems to be pretty efficient and the TF version had some correctness issues – https://github.com/tensorflow/tensorflow/issues/8905

https://github.com/yaroslavvb/stuff/blob/master/svd_benchmark.py

@yaroslavvb , I both installed mkl_dnn 2018 and full mkl 2018, add

#ifdef INTEL_MKL
#define EIGEN_USE_MKL_ALL
#endif // INTEL_MKL

to tensorflow/core/kernels/qr_op_impl.h, run bazel build --config=opt --config=cuda --define=using_mkl=true //tensorflow/tools/pip_package:build_pip_package to build, got

TF QR on 4096 by 4096 matrix in 2.59 seconds
numpy QR on 4096 by 4096 matrix in 2.64 seconds