tensorflow: Incorrect results from tfprof

I was profiling a few matrix multiplications with tfprof but couldn’t make sense of the output. The per-op timing results are hard to believe to be true.

Code:

import tensorflow as tf
import tensorflow.contrib.tfprof as tfprof

dim = 6000
n = 5
with tf.device('/gpu:0'):
    X, Y, Z, _X, _Y = [], [], [], [], []
    for i in range(n):
        X.append(tf.random_uniform([dim, dim], 0, 10, name='X' + str(i)))
        Y.append(tf.random_uniform([dim, dim], 0, 10, name='Y' + str(i)))
        _X.append(tf.placeholder(dtype=tf.float32, shape=[dim, dim]))
        _Y.append(tf.placeholder(dtype=tf.float32, shape=[dim, dim]))
        Z.append(tf.matmul(_X[i], _Y[i]))
    W = tf.ones([dim, dim])
    for i in range(n):
        W = tf.matmul(W, Z[i])

sess = tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True)))
sess.run(tf.global_variables_initializer())
X_, Y_ = sess.run([X, Y])

run_metadata = tf.RunMetadata()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
W_ = sess.run(W,
              {_i: i_ for _i, i_ in zip(_X + _Y, X_ + Y_)},
              options=run_options,
              run_metadata=run_metadata)

tfprof.model_analyzer.print_model_analysis(
    tf.get_default_graph(),
    run_meta=run_metadata,
    tfprof_options=tfprof.model_analyzer.PRINT_ALL_TIMING_MEMORY)

Output:

_TFProfRoot (0B/1296.00MB, 0us/778.18ms)
  MatMul (144.00MB/144.00MB, 162.51ms/162.51ms)
  MatMul_1 (144.00MB/144.00MB, 145.03ms/145.03ms)
  MatMul_2 (144.00MB/144.00MB, 145.07ms/145.07ms)
  MatMul_3 (144.00MB/144.00MB, 162.69ms/162.69ms)
  MatMul_4 (144.00MB/144.00MB, 162.57ms/162.57ms)
  MatMul_5 (144.00MB/144.00MB, 241us/241us)
  MatMul_6 (144.00MB/144.00MB, 21us/21us)
  MatMul_7 (144.00MB/144.00MB, 16us/16us)
  MatMul_8 (144.00MB/144.00MB, 10us/10us)
  MatMul_9 (144.00MB/144.00MB, 19us/19us)
  ones (144.00MB/144.00MB, 6us/6us)

Graph:

As shown in the profiling output, MatMul_[0-4] take 1,000 -10,000x longer than MatMul_[5-9]. However, they are all matrix multiplications of the same size!

Environment info

Operating System: Ubuntu 14.04.5

Installed version of CUDA and cuDNN:

-rw-r--r-- 1 root root 556000 Mar  1 12:58 /usr/local/cuda/lib64/libcudadevrt.a
lrwxrwxrwx 1 root root     16 Mar  1 12:58 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.8.0
lrwxrwxrwx 1 root root     19 Mar  1 12:58 /usr/local/cuda/lib64/libcudart.so.8.0 -> libcudart.so.8.0.61
-rwxr-xr-x 1 root root 415432 Mar  1 12:58 /usr/local/cuda/lib64/libcudart.so.8.0.61
-rw-r--r-- 1 root root 775162 Mar  1 12:58 /usr/local/cuda/lib64/libcudart_static.a

If installed from binary pip package, provide:

A link to the pip package you installed: tensorflow_gpu-1.0.1-cp27-cp27mu-manylinux1_x86_64.whl
The output from python -c "import tensorflow; print(tensorflow.__version__)". 1.0.1

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 19 (7 by maintainers)

Most upvoted comments

https://github.com/tensorflow/tensorflow/commit/404db0358aacc0fc2319fc333df4cc5bc0fc404a

On Tue, Mar 28, 2017 at 10:50 AM, Bo Wang notifications@github.com wrote:

@panyx0718 https://github.com/panyx0718 Can you refer me to the commit that fixes this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/8569#issuecomment-289849860, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwQe5iansiCjJJZuQ10ljnSy4Va5D1Lks5rqUh5gaJpZM4MjKHs .

– Thanks Xin

panyx0718 on Mar 28, 2017

A fix has been submitted, will be released.

panyx0718 on Mar 24, 2017