tensorflow: hclhkbu dlbench shows Tensorflow is slower than other frameworks

Based on https://github.com/tensorflow/tensorflow/issues/7065#issuecomment-276648478

Recent update of Benchmarking State-of-the-Art Deep Learning Software Tools (by @shyhuai @FreemanX @xiaowec , if I got it right) shows some performance issues. For example, (see table 7) AlexNet-R is significantly (~ 10 times) slower in TF than in other frameworks, an it’s even slower at GTX 980 than at GTX 1080. Also, ResNet-50 is ~5.5 times faster in MXNet. Those are most significant differences.

In addition, LSTM is around 3 times faster in CNTK, and ResNet-56 is twice faster in MXNet.

Version used was TensorFlow 0.11 (commit 47dd089) with CUDA 8.0 and cuDNN 5.1

cc @yaroslavvb @annarev

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 2
  • Comments: 36 (33 by maintainers)

Most upvoted comments

Hi Randl,

Thank you for the information. We are working through the benchmark and days away from publishing a performance guide. There are a couple things that I noticed at a glance with the code they are using. Before I mention them I want to stress that this is code that was published in the TensorFlow repo and I am not shifting blame. Now some things to look for:

  • Loading data with feed_dict as such: sess.run([train_op, average_op], feed_dict=feed_dict). This is almost the slowest possible approach and is often used in examples.
  • Allowing the preprocessing (of say images) to end up on the GPU. This happens if it is not placed on the CPU. This can result in 6x+ increased performance.

There are other little tweaks but with just the two “tricks” above I suspect a few of the benchmarks you listed would improve dramatically. There are other tweaks but using TF 1.0+ and the above techniques would help a lot.

I will leave this open until I can post some numbers and possibly get someone to post a PR to the benchmark project.

Thank you again for following up and opening a new issue.