tensorflow: ResourceExhaustedError in CNN/MNIST example (with GPU)

(I’m using GPU(GTX 980) with CUDA-7.0&cuDNNv2, on Ubuntu 14.04) I have gone through MNIST tutorial: http://tensorflow.org/tutorials/mnist/pros/index.md

Everything was going well except for the last two lines:

print "test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

Executing these lines, I got an error:

ResourceExhaustedError: OOM when allocating tensor with shapedim { size: 10000 } dim { size: 18 } dim { size: 18 } dim { size: 32 }

I think basic reason for this error is that test data can not be allocated to GPU device. Is this a bug or not? Are there good way to avoid this issue?

About this issue

Original URL
State: closed
Created 9 years ago
Reactions: 68
Comments: 45 (5 by maintainers)

Links to this issue

python - Tensorflow Deep MNIST: Resource exhausted: OOM when allocating tensor with shape[10000,32,28,28] - Stack Overflow

Commits related to this issue

[OpenCL] Fix allocator destruction race condition (#136) (#11968) * [OpenCL] Fix allocator destruction race condition (#136) * [OpenCL] Changes SYCL Interface construction Uses C++11 static ini... — committed to tensorflow/tensorflow by deleted user 7 years ago
[OpenCL] Fix allocator destruction race condition (#136) * [OpenCL] Changes SYCL Interface construction Uses C++11 static initialisation to provide singleton instance, rather than a mutex and pointe... — committed to codeplaysoftware/tensorflow by jwlawson 7 years ago
Compute test accuracy in batches to avoid OOM on GPUs. Reported here: https://github.com/tensorflow/tensorflow/issues/136 Alternative to this without changing convolutional.py: https://github.com/tens... — committed to thisisrandy/tensorflow by thisisrandy 7 years ago
Compute test accuracy in batches to avoid OOM on GPUs. Reported here: https://github.com/tensorflow/tensorflow/issues/136 Alternative to this for mnist_deep.py: https://github.com/tensorflow/tensorflo... — committed to thisisrandy/tensorflow by thisisrandy 7 years ago
Upgrade/fix/simplify store to load forwarding - fix store to load forwarding for a certain set of cases (where forwarding shouldn't have happened); use AffineValueMap difference based MemRefAcces... — committed to tensorflow/tensorflow by bondhugula 5 years ago
Upgrade/fix/simplify store to load forwarding - fix store to load forwarding for a certain set of cases (where forwarding shouldn't have happened); use AffineValueMap difference based MemRefAcces... — committed to hristo-vrigazov/tensorflow by bondhugula 5 years ago
Merge pull request #136 from yxsamliu/hip-clang-philox Add __device__ to PHILOX_DEVICE_FUNC for hip-clang — committed to Cerebras/tensorflow by whchung 6 years ago

Most upvoted comments

If you don’t have enough memory on your GPU to fit the whole test data, you could feed it in small batches to the eval graph using feed_dict like the example does with the training data.

+74

vincentvanhoucke on Nov 11, 2015

I am doing similar things as @Shuto050505 did here, but I compute the mean of accuracy for each batch in all test data. Replace the following line:

print("test accuracy %g"%accuracy.eval(feed_dict={ 
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

by this:

batch_size = 50
batch_num = int(mnist.test.num_examples / batch_size)
test_accuracy = 0
    
for i in range(batch_num):
    batch = mnist.test.next_batch(batch_size)
    test_accuracy += accuracy.eval(feed_dict={x: batch[0],
                                              y_: batch[1],
                                              keep_prob: 1.0})

test_accuracy /= batch_num
print("test accuracy %g"%test_accuracy)

I will get the mean of test accuracy test accuracy 0.9922.

EDIT: Updated accuracy with corrected training process

+28

ChinChangYang on Apr 8, 2017

“Just reduce the batch size while feeding the test data to GPU” batch_size=1

Now what? 0.5, 0?

Have 2 1080ti’s, and sill running out of memory.

+17

shaunstoltz on May 27, 2018

I’m using a 2GB 860M. mnist.test.image.shape is (10000,784). Restricting shape_size to 7000, I could do it.

batch size batch_tx, batch_ty = mnist.test.next_batch(10)
print("test accuracy %g"%accuracy.eval(feed_dict={x:batch_tx, y_: batch_ty, keep_prob: 1.0})) -> test accuracy 0.992

slice test_image = mnist.test.images[0:7000, :]
test_label = mnist.test.labels[0:7000, :]
print("test accuracy %g"%accuracy.eval(feed_dict={x: test_image, y_: test_label, keep_prob: 1.0})) ->test accuracy 0.992

+16

Shuto050505 on Jan 24, 2017

FYI: I had the same error (Ram 32GB, Titan X 12GB). Restarting iPython notebook helped.

+13

orian on Oct 23, 2016

I have this same problem on this same example with Ubuntu 14.04 with Nvidia GTX 970. Hits 3.35 GB usage and then crashes.

+10

EigenFace on Nov 11, 2015

@mtourne Thanks a lot for the fix. It works for my GPU (previously I also had Resource Exhausted error).

Another way following @vrv 's advice from this discussion is to install tensorflow from the latest source and configure your session to use BFC allocator before running it like this:

config = tf.ConfigProto() config.gpu_options.allocator_type = ‘BFC’ with tf.Session(config = config) as s:

The original example from @vrv is here. My GPU has 6 GB though. This allocator seems to dynamically allocate memory according to GPU’s memory so it will probably work with cards having less memory as well.

luong-vinh on Nov 12, 2015

I was trying the Deep MNIST from the tutorial on the site. Using the entire test set throws the ran out of memory trying to allocate 78.1KiB (So close!) Changing the batches of the test images works, But pushing it to batches of 5000, just so I could see where exactly the program breaks, gave me a warning that it ran out of memory. The strange thing is that, while the previous attempt, without batching, completely broke the program, this didn’t. I still got the result, but with a lot of warning that says:

W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate 2.92GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

I am using a 2GB 960M and I can confirm that the BFC option is enabled, since there were related errors with chunks when it completely crashed the first time.

Morpheus3000 on Dec 3, 2016

Same issue with the latest stable release of Tensorflow, Quadro 970M with 2GB mem.

According the logging output, BFC allocator was being used.

W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:211] Ran out of memory trying to allocate 29.91MiB.  See logs for memory state

29.9 MB doesn’t seem like an awful lot, but I’m assuming it has to be allocated contiguously.

orodbhen on Mar 2, 2016

I have had a similar problem just now. Batch of images had to be analysed and I, by mistake, created a function, that was loading a model into the memory for every image to be recognised.

def predict(num_class, weights_path, img_path):    
	base_model = VGG16.VGG16(include_top=False, weights=None)
	x = base_model.output
	x = Dense(128)(x)
	x = GlobalAveragePooling2D()(x)
	predictions = Dense(num_class, activation='softmax')(x)

	model = Model(inputs=base_model.input, outputs=predictions)
	model.load_weights(weights_path)
	....
	....

Hopefully, it will help someone!

aleksandrskrivickis on Jun 14, 2018

I am new to tensorflow and Machine Learning. Recently I am working on a model. My model is like below,

Character level Embedding Vector -> Embedding lookup -> LSTM1
Word level Embedding Vector->Embedding lookup -> LSTM2
[LSTM1+LSTM2] -> single layer MLP-> softmax layer
[LSTM1+LSTM2] -> Single layer MLP-> WGAN discriminator

while I’m working on this model I got the following error. I thought My batch is too big. Thus I tried to reduce the batch size from 20 to 10 but it doesn’t work.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[24760,100] [[Node: chars/bidirectional_rnn/bw/bw/while/bw/lstm_cell/split = Split[T=DT_FLOAT, num_split=4, _device=“/job:localhost/replica:0/task:0/device:GPU:0”](gradients_2/Add_3/y, chars/bidirectional_rnn/bw/bw/while/bw/lstm_cell/BiasAdd)]] [[Node: bi-lstm/bidirectional_rnn/bw/bw/stack/_167 = _Recvclient_terminated=false, recv_device=“/job:localhost/replica:0/task:0/device:CPU:0”, send_device=“/job:localhost/replica:0/task:0/device:GPU:0”, send_device_incarnation=1, tensor_name=“edge_636_bi-lstm/bidirectional_rnn/bw/bw/stack”, tensor_type=DT_INT32, _device=“/job:localhost/replica:0/task:0/device:CPU:0”]]

tensor with shape[24760,100] means 247600032/10241024 = 75.*** MB memory. I am running the code on a titan X(11 GB) gpu. What could go wrong. Why this type of error occured?

Extra info: the size of the LSTM1 is 100. for bidirectional LSTM it becomes 200. The size of the LSTM2 is 300. For Bidirectional LSTM it becomes 600.

Note : The error occurred after 32 epoch. My question is why after 32 epoch there is an error. Why not at the initial epoch.

sbmaruf on Dec 28, 2017