tensorflow: Computer freeze when feeding a large numpy array as input in MNIST tutorial
What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?
None
Environment info
intel i5, 8gb ram Operating System: Ubuntu 14.04.5: Linux 4.4.0-45-generic #66~14.04.1-Ubuntu SMP Wed Oct 19 15:05:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Installed version of CUDA and cuDNN: No cuda or cuDNN, running on CPU
If installed from binary pip package, provide:
- A link to the pip package you installed: TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0-cp34-cp34m-linux_x86_64.whl
- The output from
python -c "import tensorflow; print(tensorflow.__version__)"
. 0.11.0rc2
If possible, provide a minimal reproducible example (We usually don’t have time to read hundreds of lines of your code)
Use the mnist tutorial, replace
train_accuracy = accuracy.eval(feed_dict={ x:batch[0], y_: batch[1], keep_prob: 1.0})
with
train_accuracy = accuracy.eval(feed_dict={ x:mnist.train.images, y_: mnist.train.labels, keep_prob: 1.0})
So that the accuracy is evaluated over the entire training set. I’ve reproduced this using another dataset, and the problem goes away when using a smaller number of examples. (pastebin for entire file with this modification: http://pastebin.com/THNqB4ws)
What other attempted solutions have you tried?
None, there’s no error message and the computer hangs the first time it executes accuracy.eval
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 16 (7 by maintainers)
Actually if you look at memory timeline you’ll see that activations for
h_conv1
are discarded as soon ash_conv1+b_conv1
completes. This is because it’s eval pass, so they are not needed for derivatives. Ifadd
could be done in place this would essentially lower the peak usage from 12GB to 6GB, and that’s perhaps what the upcoming XLA framework could do. However, it would be trickier to lower usage for the backward pass as well sinceh_conv1
activations are needed until much later – you would need an implementation of fusedconv + add
op, and a corresponding gradient.I took a closer look at your memory usage and the main source of memory usage is the activation in your first conv layer.
Your convolution has 32 filters, so activations for first conv layer takes 4x28x28x32 = 100k per example. This is followed by broadcasting add which makes it 200k total. To run this over whole dataset you need 60k x 200k = 12GB of RAM