tensorflow: sess.run(tf.global_variables_initializer()) is slow after installing 1.0.0.

The issue

sess.run(tf.global_variables_initializer()) intermittently takes a long time since installing 1.0.0. I believe this also happened on my Python 2 0.12 install, but only after installing 1.0.0 for Python 3. This particular run was in Python 2, and the initializer took 32 seconds - see bottom of this post (and the attached cprofile output) for more detail.

Strangely, it seems like the more I run the same file, the faster it runs.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

None.

Environment info

Operating System: Ubuntu 16.04

Installed version of CUDA and cuDNN: (please attach the output of ls -l /path/to/cuda/lib/libcud*):

ls -l /usr/local/cuda/lib64/libcud*
-rw-r--r-- 1 root root   560184 Oct 27 16:26 /usr/local/cuda/lib64/libcudadevrt.a
lrwxrwxrwx 1 root root       16 Oct 27 16:26 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.8.0
lrwxrwxrwx 1 root root       19 Oct 27 16:26 /usr/local/cuda/lib64/libcudart.so.8.0 -> libcudart.so.8.0.27
-rwxr-xr-x 1 root root   394472 Oct 27 16:26 /usr/local/cuda/lib64/libcudart.so.8.0.27
-rw-r--r-- 1 root root   737516 Oct 27 16:26 /usr/local/cuda/lib64/libcudart_static.a
-rwxr-xr-x 1 root root 60696704 Oct 27 16:31 /usr/local/cuda/lib64/libcudnn.so
-rwxr-xr-x 1 root root 60696704 Oct 27 16:31 /usr/local/cuda/lib64/libcudnn.so.5
-rwxr-xr-x 1 root root 60696704 Oct 27 16:31 /usr/local/cuda/lib64/libcudnn.so.5.1.3
-rw-r--r-- 1 root root 59715990 Oct 27 16:31 /usr/local/cuda/lib64/libcudnn_static.a

If installed from binary pip package, provide:

  1. A link to the pip package you installed: sudo pip install tensorflow-gpu on 2/21/17
  2. The output from python -c "import tensorflow; print(tensorflow.__version__)". 1.0.0

If possible, provide a minimal reproducible example (We usually don’t have time to read hundreds of lines of your code)

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import time
from collections import deque

import numpy as np
print('Importing tensorflow')
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

xavier = tf.contrib.layers.xavier_initializer
batch_norm = tf.contrib.layers.batch_norm

tf.set_random_seed(1)

flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_string('data_dir', 'data/', 'Directory for storing data')
print('Reading data.')
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
initialization_variance = 0.1

print('Creating a session.')
sess = tf.Session()

def fully_connected_layer(inputs, output_size, scope, nonlinearity=True):
  with tf.variable_scope(scope):
    input_size = inputs.get_shape()[1].value

    weights = tf.get_variable('w', 
                              [input_size, output_size],
                              initializer=tf.constant_initializer(0.0))

    bias = tf.get_variable('b', 
                           [output_size],
                           initializer=tf.constant_initializer(0.0))

    multiplied = tf.matmul(inputs, weights)

    return tf.nn.elu(multiplied + bias) if nonlinearity else multiplied + bias

print('Setting up the model.')
input_images = tf.placeholder(tf.float32, [None, 784])
labels = tf.placeholder(tf.float32, [None, 10]) # labels
learning_rate_placeholder = tf.placeholder(tf.float32, name='learning-rate')

layer_a = fully_connected_layer((input_images), 50, 'a')

for i in range(5):
  layer_a = batch_norm(fully_connected_layer(layer_a, 50, str(i)))

layer_b = fully_connected_layer(layer_a, 10, 'b', nonlinearity=False)

predictions = tf.nn.softmax(layer_b) 
 
cross_entropy = tf.reduce_mean(-tf.reduce_sum(labels * tf.log(predictions), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(learning_rate_placeholder).minimize(cross_entropy)

print('Initializing variables.')
sess.run(tf.global_variables_initializer())

What other attempted solutions have you tried?

None.

Logs or other output that would be helpful

Some output from python2 -m cProfile -o out.profile 1-mnist-simple.py. This is a call to sess.run(tf.global_variables_initializer()). image

cprofile-output.profile.zip

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 19 (8 by maintainers)

Most upvoted comments

Alas! I’m going to close, but with sadness. Let us know if a clearer picture ever emerges.

I had this same issue with my own docker container, even while using the ‘latest-devel-gpu’ container as @afri mentioned.

However, @chrisranderson, your comment about opencv made me realize what the problem was. You’re right, compiling opencv with cuda support caused the slow startup for me.

By compiling opencv with -D WITH_CUDA=OFF, the problem is fixed, and my models run almost instantly.

EDIT: Also, if you want a quick fix, you can also just try importing cv2 before importing tensorflow. That worked for me too.