tensorflow: hclhkbu dlbench shows Tensorflow is slower than other frameworks

Based on https://github.com/tensorflow/tensorflow/issues/7065#issuecomment-276648478

Recent update of Benchmarking State-of-the-Art Deep Learning Software Tools (by @shyhuai @FreemanX @xiaowec , if I got it right) shows some performance issues. For example, (see table 7) AlexNet-R is significantly (~ 10 times) slower in TF than in other frameworks, an it’s even slower at GTX 980 than at GTX 1080. Also, ResNet-50 is ~5.5 times faster in MXNet. Those are most significant differences.

In addition, LSTM is around 3 times faster in CNTK, and ResNet-56 is twice faster in MXNet.

Version used was TensorFlow 0.11 (commit 47dd089) with CUDA 8.0 and cuDNN 5.1

cc @yaroslavvb @annarev

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 2
Comments: 36 (33 by maintainers)

Most upvoted comments

Hi Randl,

Thank you for the information. We are working through the benchmark and days away from publishing a performance guide. There are a couple things that I noticed at a glance with the code they are using. Before I mention them I want to stress that this is code that was published in the TensorFlow repo and I am not shifting blame. Now some things to look for:

Loading data with feed_dict as such: sess.run([train_op, average_op], feed_dict=feed_dict). This is almost the slowest possible approach and is often used in examples.
Allowing the preprocessing (of say images) to end up on the GPU. This happens if it is not placed on the CPU. This can result in 6x+ increased performance.

There are other little tweaks but with just the two “tricks” above I suspect a few of the benchmarks you listed would improve dramatically. There are other tweaks but using TF 1.0+ and the above techniques would help a lot.

I will leave this open until I can post some numbers and possibly get someone to post a PR to the benchmark project.

Thank you again for following up and opening a new issue.

tfboyd on Feb 9, 2017