tensorflow: Computer freeze when feeding a large numpy array as input in MNIST tutorial

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

None

Environment info

intel i5, 8gb ram Operating System: Ubuntu 14.04.5: Linux 4.4.0-45-generic #66~14.04.1-Ubuntu SMP Wed Oct 19 15:05:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Installed version of CUDA and cuDNN: No cuda or cuDNN, running on CPU

If installed from binary pip package, provide:

  1. A link to the pip package you installed: TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0-cp34-cp34m-linux_x86_64.whl
  2. The output from python -c "import tensorflow; print(tensorflow.__version__)". 0.11.0rc2

If possible, provide a minimal reproducible example (We usually don’t have time to read hundreds of lines of your code)

Use the mnist tutorial, replace train_accuracy = accuracy.eval(feed_dict={ x:batch[0], y_: batch[1], keep_prob: 1.0}) with train_accuracy = accuracy.eval(feed_dict={ x:mnist.train.images, y_: mnist.train.labels, keep_prob: 1.0})

So that the accuracy is evaluated over the entire training set. I’ve reproduced this using another dataset, and the problem goes away when using a smaller number of examples. (pastebin for entire file with this modification: http://pastebin.com/THNqB4ws)

What other attempted solutions have you tried?

None, there’s no error message and the computer hangs the first time it executes accuracy.eval

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

Actually if you look at memory timeline you’ll see that activations for h_conv1 are discarded as soon as h_conv1+b_conv1 completes. This is because it’s eval pass, so they are not needed for derivatives. If add could be done in place this would essentially lower the peak usage from 12GB to 6GB, and that’s perhaps what the upcoming XLA framework could do. However, it would be trickier to lower usage for the backward pass as well since h_conv1 activations are needed until much later – you would need an implementation of fused conv + add op, and a corresponding gradient.

I took a closer look at your memory usage and the main source of memory usage is the activation in your first conv layer.

Your convolution has 32 filters, so activations for first conv layer takes 4x28x28x32 = 100k per example. This is followed by broadcasting add which makes it 200k total. To run this over whole dataset you need 60k x 200k = 12GB of RAM