tensorflow: Incorrect results from tfprof
I was profiling a few matrix multiplications with tfprof but couldn’t make sense of the output. The per-op timing results are hard to believe to be true.
Code:
import tensorflow as tf
import tensorflow.contrib.tfprof as tfprof
dim = 6000
n = 5
with tf.device('/gpu:0'):
X, Y, Z, _X, _Y = [], [], [], [], []
for i in range(n):
X.append(tf.random_uniform([dim, dim], 0, 10, name='X' + str(i)))
Y.append(tf.random_uniform([dim, dim], 0, 10, name='Y' + str(i)))
_X.append(tf.placeholder(dtype=tf.float32, shape=[dim, dim]))
_Y.append(tf.placeholder(dtype=tf.float32, shape=[dim, dim]))
Z.append(tf.matmul(_X[i], _Y[i]))
W = tf.ones([dim, dim])
for i in range(n):
W = tf.matmul(W, Z[i])
sess = tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True)))
sess.run(tf.global_variables_initializer())
X_, Y_ = sess.run([X, Y])
run_metadata = tf.RunMetadata()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
W_ = sess.run(W,
{_i: i_ for _i, i_ in zip(_X + _Y, X_ + Y_)},
options=run_options,
run_metadata=run_metadata)
tfprof.model_analyzer.print_model_analysis(
tf.get_default_graph(),
run_meta=run_metadata,
tfprof_options=tfprof.model_analyzer.PRINT_ALL_TIMING_MEMORY)
Output:
_TFProfRoot (0B/1296.00MB, 0us/778.18ms)
MatMul (144.00MB/144.00MB, 162.51ms/162.51ms)
MatMul_1 (144.00MB/144.00MB, 145.03ms/145.03ms)
MatMul_2 (144.00MB/144.00MB, 145.07ms/145.07ms)
MatMul_3 (144.00MB/144.00MB, 162.69ms/162.69ms)
MatMul_4 (144.00MB/144.00MB, 162.57ms/162.57ms)
MatMul_5 (144.00MB/144.00MB, 241us/241us)
MatMul_6 (144.00MB/144.00MB, 21us/21us)
MatMul_7 (144.00MB/144.00MB, 16us/16us)
MatMul_8 (144.00MB/144.00MB, 10us/10us)
MatMul_9 (144.00MB/144.00MB, 19us/19us)
ones (144.00MB/144.00MB, 6us/6us)
Graph:

As shown in the profiling output, MatMul_[0-4] take 1,000 -10,000x longer than MatMul_[5-9]. However, they are all matrix multiplications of the same size!
Environment info
Operating System: Ubuntu 14.04.5
Installed version of CUDA and cuDNN:
-rw-r--r-- 1 root root 556000 Mar 1 12:58 /usr/local/cuda/lib64/libcudadevrt.a
lrwxrwxrwx 1 root root 16 Mar 1 12:58 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.8.0
lrwxrwxrwx 1 root root 19 Mar 1 12:58 /usr/local/cuda/lib64/libcudart.so.8.0 -> libcudart.so.8.0.61
-rwxr-xr-x 1 root root 415432 Mar 1 12:58 /usr/local/cuda/lib64/libcudart.so.8.0.61
-rw-r--r-- 1 root root 775162 Mar 1 12:58 /usr/local/cuda/lib64/libcudart_static.a
If installed from binary pip package, provide:
-
A link to the pip package you installed:
tensorflow_gpu-1.0.1-cp27-cp27mu-manylinux1_x86_64.whl -
The output from
python -c "import tensorflow; print(tensorflow.__version__)".1.0.1
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 19 (7 by maintainers)
https://github.com/tensorflow/tensorflow/commit/404db0358aacc0fc2319fc333df4cc5bc0fc404a
On Tue, Mar 28, 2017 at 10:50 AM, Bo Wang notifications@github.com wrote:
– Thanks Xin
A fix has been submitted, will be released.