tensorflow: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes and No (described below)
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary): tf-nightly-gpu (Dec 19, r1.13)
- TensorFlow version (use command below): 1.13.0-dev20181219
- Python version: 3.7.1
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1
- GPU model and memory: RTX 2070 8GB
Describe the current behavior
I’m running the CNN model on MNIST. When I’m running with the GPU, I am encountering
2018-12-20 20:09:13.644176: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I did some digging and realized that it is a memory issue (which shouldn’t be the case as I have 32GB of RAM and 64GB of swap. I ran htop when running the model and I have 20+GB free, which is more than enough to fit the 8GB vRAM mappings.
Using the gpu_options.allow_growth = True
gets the model to work properly, and setting os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
also works. This means that I AM facing a memory issue, but I don’t see how.
Also, using gpu_options.allow_growth = True
does not fix the same issue when trying to run tensorflow/models/official/mnist/ model, which should have a similar behavior with my code.
Code to reproduce the issue
import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import math
import time
# Killing optional CPU driver warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
tf.logging.set_verbosity(tf.logging.ERROR)
class Model:
def __init__(self, image, label):
"""
A Model class contains a computational graph that classifies images
to predictions. Each of its methods builds part of the graph
on Model initialization. Do not modify the constructor, as doing so
would break the autograder. You may, however, add class variables
to use in your graph-building. e.g. learning rate,
image: the input image to the computational graph as a tensor
label: the correct label of an image as a tensor
prediction: the output prediction of the computational graph,
produced by self.forward_pass()
optimize: the model's optimizing tensor produced by self.optimizer()
loss: the model's loss produced by computing self.loss_function()
accuracy: the model's prediction accuracy
"""
self.image = image
self.label = label
# TO-DO: Add any class variables you want to use.
self.prediction = self.forward_pass()
self.loss = self.loss_function()
self.optimize = self.optimizer()
self.accuracy = self.accuracy_function()
def forward_pass(self):
"""
Predicts a label given an image using convolution layers
:return: the prediction as a tensor
"""
filter_1 = tf.Variable(tf.truncated_normal([3, 3, 1, 8], stddev=0.1))
conv_1 = tf.nn.conv2d(self.image, filter_1, [1, 1, 1, 1], "SAME")
reshaped = tf.reshape(conv_1, shape=[50, -1])
L1 = reshaped.shape[1].value
L2 = 500
W1 = tf.Variable(tf.random_normal([L1, L2], mean=0, stddev=0.01))
b1 = tf.Variable(tf.random_normal([L2], mean=0, stddev=0.01))
relu_1 = tf.nn.relu(tf.matmul(reshaped, W1) + b1)
W2 = tf.Variable(tf.random_normal([L2, 10], mean=0, stddev=0.01))
b2 = tf.Variable(tf.random_normal([10], mean=0, stddev=0.01))
logits = tf.nn.relu(tf.matmul(relu_1, W2) + b2)
return logits
def loss_function(self):
"""
Calculates the model cross-entropy loss
:return: the loss of the model as a tensor
"""
loss = tf.losses.softmax_cross_entropy(onehot_labels=self.label, logits=self.prediction)
return loss
def optimizer(self):
"""
Optimizes the model loss using an Adam Optimizer
:return: the optimizer as a tensor
"""
learning_rate = 0.1
sgd = tf.train.GradientDescentOptimizer(learning_rate)
train = sgd.minimize(self.loss)
return train
def accuracy_function(self):
"""
Calculates the model's prediction accuracy by comparing
predictions to correct labels – no need to modify this
:return: the accuracy of the model as a tensor
"""
correct_prediction = tf.equal(tf.argmax(self.prediction, 1),
tf.argmax(self.label, 1))
return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
def main():
t_start = time.time()
mnist = input_data.read_data_sets("data/mnist/", one_hot=True)
batch_sz = 50
batch = 2000
inputs = tf.placeholder(shape=[batch_sz, 28, 28, 1], dtype=tf.float32)
labels = tf.placeholder(shape=[batch_sz, 10], dtype=tf.float32)
model = Model(inputs, labels)
session_config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
sess = tf.Session(config=session_config)
# sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(batch):
next_image, next_label = mnist.train.next_batch(batch_sz)
next_image = next_image.reshape((batch_sz, 28, 28, 1))
sess.run(model.optimize, feed_dict={inputs: next_image, labels: next_label})
acc, test_images, test_labels = 0, mnist.test.images, mnist.test.labels
test_batch = math.ceil(len(test_images) / batch_sz)
for i in range(test_batch):
batch_images = test_images[i * batch_sz: (i + 1) * batch_sz]
batch_images = batch_images.reshape((batch_sz, 28, 28, 1))
batch_labes = test_labels[i * batch_sz: (i + 1) * batch_sz]
acc += sess.run(model.accuracy, feed_dict={inputs: batch_images, labels: batch_labes})
acc /= test_batch
print(acc)
print(time.time() - t_start, 'seconds')
return
if __name__ == '__main__':
main()
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 62
- Comments: 186 (20 by maintainers)
Links to this issue
Commits related to this issue
- TF by default steals all GPU memory for a device This is causing issues on an RTX 2080 with TF 1.13 and cuDNN 7.6 (with CUDA 10). The issue is documented here: https://github.com/tensorflow/tensorfl... — committed to mead-ml/mead-baseline by dpressel 5 years ago
- TF by default steals all GPU memory for a device (#311) * TF by default steals all GPU memory for a device This is causing issues on an RTX 2080 with TF 1.13 and cuDNN 7.6 (with CUDA 10). The is... — committed to mead-ml/mead-baseline by dpressel 5 years ago
- Work around https://github.com/tensorflow/tensorflow/issues/24496 — committed to mikegerber/calamari by mikegerber 5 years ago
- Fix bug for RTX 2080 Ti Fix mentioned in https://github.com/tensorflow/tensorflow/issues/24496 config = tf.ConfigProto() config.gpu_options.allow_growth = True tf.keras.backend.set_session(tf.Sessio... — committed to matteobarbieri/alpr-unconstrained by matteobarbieri 5 years ago
- Fixed error when creating cudnn handle that happens when running in a RTX series GPU, known error, fix Link: https://github.com/tensorflow/tensorflow/issues/24496 — committed to psicobloc/STORK by psicobloc 5 years ago
- Fixed error when creating cudnn handle that happens when running in a RTX series GPU, known error, fix Link: https://github.com/tensorflow/tensorflow/issues/24496 — committed to psicobloc/STORK by psicobloc 5 years ago
- Fixing cudnn error on RTX gpus by adding allow_growth=True https://github.com/tensorflow/tensorflow/issues/24496 — committed to ludwig-ai/ludwig by w4nderlust 5 years ago
- Updates to train script and added option to allow gpu growth Create output_dir if it does not exist Update scripts/train to new TrainSettings interface Add option gpu_allow_growth to TrainSettings ... — committed to OMMR4all/ommr4all-page-segmentation by crater2150 5 years ago
- Add option gpu_allow_growth to TrainSettings Uses the allow_growth tensorflow setting to work around CUDNN_STATUS_INTERNAL_ERROR See https://github.com/tensorflow/tensorflow/issues/24496 — committed to ocr-d-modul-2-segmentierung/page-segmentation by crater2150 5 years ago
- Turn on dynamic memory allocation on the GPU This allows GPU memory to dynamically grow. This is a workaround to fix this issue on RTX cards. https://github.com/tensorflow/tensorflow/issues/24496. Ho... — committed to mpowelson/point_cloud_segmentation-1 by mpowelson 5 years ago
- updated output mask to range between 0-1 Add FCN8 Node Fix bug when annotation fails Add xml tag to launch file Turn on dynamic memory allocation on the GPU This allows GPU memory to dynamically ... — committed to Jake-Janssen/point_cloud_segmentation by deleted user 5 years ago
- VLOG results from MinSystemMemory Seems generally useful and will help triage https://github.com/tensorflow/tensorflow/issues/24496. PiperOrigin-RevId: 287042036 Change-Id: I64134707ecc12a5d5470a8e5... — committed to tensorflow/tensorflow by deleted user 5 years ago
- Add option gpu_allow_growth to TrainSettings Uses the allow_growth tensorflow setting to work around CUDNN_STATUS_INTERNAL_ERROR See https://github.com/tensorflow/tensorflow/issues/24496 — committed to ocr-d-modul-2-segmentierung/page-segmentation by crater2150 5 years ago
- Fixed Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR according to https://github.com/tensorflow/tensorflow/issues/24496 — committed to lorenz0890/TensorQuant by deleted user 4 years ago
- Add workaround for CUDNN_STATUS_INTERNAL_ERROR Fix "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" error when using the GPU version of TensorFlow. See https://github.com/tensorflow/tens... — committed to siphomateke/attention-ocr by siphomateke 4 years ago
- comment Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR https://github.com/tensorflow/tensorflow/issues/24496 — committed to rwth-i6/returnn by albertz 4 years ago
- comment Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR https://github.com/tensorflow/tensorflow/issues/24496 — committed to Spotlight0xff/returnn by albertz 4 years ago
- (tensorflow) Fix issues when running with cuda Solution from https://github.com/tensorflow/tensorflow/issues/24496 — committed to tue-robotics/image_recognition by MatthijsBurgh 3 years ago
- (tensorflow) Fix issues when running with cuda Solution from https://github.com/tensorflow/tensorflow/issues/24496 — committed to tue-robotics/image_recognition by MatthijsBurgh 3 years ago
I did try compiling from source, but ran into the same issue. I was finally able to fix my problem was setting
config.gpu_options.allow_growth = True
.ok, made it work in tf-nightly-gpu-2.0-preview and ipython notebook adding this to my code:
from tensorflow.compat.v1 import ConfigProto from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto() config.gpu_options.allow_growth = True session = InteractiveSession(config=config)
I’ve been running into the same issue with the same GPU: “CUDNN_STATUS_INTERNAL_ERROR”.
RTX 2070 GPU CUDA 10 cuDNN 7.4.2 Ubuntu 18.04 tf-nightly-gpu (r1.13, Jan 13) Python 3.6.7
Try to compile r1.13 from source. It would take a long time, but it should fix your problem. At least it fixed mine.
@ymodak It looks like this issue was closed prematurely. While there is a work-around for this issue it involves changing application code. As a result the example code does not work out of the box on RTX cards and most recipes on line will also need modification.
@ymodak This bug is not fixed. Arguably, using any sort of convnet should work in the default configuration. Either allow_growth should be true by default, it should be fixed so this works, or there should be a better error than
CUDNN_STATUS_INTERNAL_ERROR
.How do you actually set allow_growth=true? I have tf-nightly-gpu-2.0-preview and tried:
import tensorflow as tf config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config, …)
but get this error:
AttributeError Traceback (most recent call last) <ipython-input-14-b4f9929bf252> in <module>() 1 import tensorflow as tf ----> 2 config = tf.ConfigProto()
AttributeError: module ‘tensorflow’ has no attribute ‘ConfigProto’
How can I set allow_growth in tensorflow 2.0?
Dude, your solution saves my life.
I’ve been having the same issue (on an RTX 2060, Ubuntu 18.04, Python 3.6.7, CUDA 10.0.130, cuDNN 7.4.2, Tensorflow 1.13.0-rc0 from source). Thanks to @va-andrew’s suggestion I have it working with the
allow_growth
option set.FWIW, in the course of searching for solutions to this it seems that this issue is a common problem with the RTX series (although it might be a general problem with CUDA 10.0, since the new cards don’t support the older versions). It would be great if the defaults could get updated in the release of 1.13 so that special options don’t need to be set for these cards.
@ymodak Can you please reference the PR that fixed this bug?
I’ve the same problem running on
RTX2080 GPU CUDA 10 cudnn 7.4.2
I tried the following tf Versions tf-nightly-gpu and a self compiled Version from master (060b6e32ad). I found out that its possible to set the following ENVIRONMENT Variables to get further Debug Info.
CUDNN_LOGINFO_DBG=1; CUDNN_LOGDEST_DBG=stdout
Then i get the following error:
I0117 14:11:24.441819 140433563125568 basic_session_run_hooks.py:594] Saving checkpoints for 0 into /tmp/mnist/model.ckpt. 2019-01-17 14:11:25.916269: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA library libcublas.so.10.0 locally
I! CuDNN (v7402) function cudnnCreate() called: i! Time: 2019-01-17T14:11:26.079184 (0d+0h+0m+0s since start) i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.
2019-01-17 14:11:26.079151: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA library libcudnn.so.7 locally
I! CuDNN (v7402) function cudnnCreate() called: i! Time: 2019-01-17T14:11:26.571897 (0d+0h+0m+0s since start) i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.
2019-01-17 14:11:26.571858: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-01-17 14:11:26.579375: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I! CuDNN (v7402) function cudnnCreate() called: i! Time: 2019-01-17T14:11:26.579803 (0d+0h+0m+0s since start) i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.
2019-01-17 14:11:26.585818: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-01-17 14:11:26.585850: W ./tensorflow/stream_executor/stream.h:2109] attempting to perform DNN operation using StreamExecutor without DNN support Traceback (most recent call last): File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1335, in _do_call return fn(*args) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1320, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1408, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node Discriminator_1/Conv/Conv2D}}]] [[train/discriminator_train/train_op/control_dependency/_569]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File “/home/dj/projects/gan/tf_models/research/gan/mnist/train.py”, line 151, in <module> tf.app.run() File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py”, line 125, in run _sys.exit(main(argv)) File “/home/dj/projects/gan/tf_models/research/gan/mnist/train.py”, line 147, in main get_hooks_fn=tfgan.get_joint_train_hooks()) File “/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py”, line 1200, in gan_train config=config) File “/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/training/python/training/training.py”, line 546, in train loss = session.run(train_op, run_metadata=run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 693, in run run_metadata=run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1188, in run run_metadata=run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1287, in run raise six.reraise(*original_exc_info) File “/usr/local/lib/python3.6/dist-packages/six.py”, line 693, in reraise raise value File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1272, in run return self._sess.run(*args, **kwargs) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1336, in run feed_dict, options) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1362, in _call_hook_before_run request = hook.before_run(run_context) File “/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py”, line 1061, in before_run run_context.session.run(self._train_ops) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 930, in run run_metadata_ptr) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1153, in _run feed_dict_tensor, options, run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1329, in _do_run run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1349, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node Discriminator_1/Conv/Conv2D (defined at home/dj/projects/gan/tf_models/research/gan/mnist/networks.py:152) ]] [[train/discriminator_train/train_op/control_dependency/_569]]
Errors may have originated from an input operation. Input Source operations connected to node Discriminator_1/Conv/Conv2D: inputs/batch/n (defined at home/dj/projects/gan/tf_models/research/gan/mnist/data_provider.py:67)
Original stack trace for ‘Discriminator_1/Conv/Conv2D’: File “home/dj/projects/gan/tf_models/research/gan/mnist/train.py”, line 151, in <module> tf.app.run() File “usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py”, line 125, in run _sys.exit(main(argv)) File “home/dj/projects/gan/tf_models/research/gan/mnist/train.py”, line 87, in main [FLAGS.batch_size, FLAGS.noise_dims])) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py”, line 118, in gan_model discriminator_real_outputs = discriminator_fn(real_data, generator_inputs) File “home/dj/projects/gan/tf_models/research/gan/mnist/networks.py”, line 176, in unconditional_discriminator net = _discriminator_helper(img, False, None, weight_decay) File “home/dj/projects/gan/tf_models/research/gan/mnist/networks.py”, line 152, in _discriminator_helper net = layers.conv2d(img, 64, [4, 4], stride=2) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py”, line 182, in func_with_args return func(*args, **current_args) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/layers/python/layers/layers.py”, line 1155, in convolution2d conv_dims=2) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py”, line 182, in func_with_args return func(*args, **current_args) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/layers/python/layers/layers.py”, line 1058, in convolution outputs = layer.apply(inputs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py”, line 1228, in apply return self.call(inputs, *args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/base.py”, line 531, in call outputs = super(Layer, self).call(inputs, *args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py”, line 564, in call outputs = self.call(inputs, *args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py”, line 196, in call outputs = self._convolution_op(inputs, self.kernel) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py”, line 966, in call return self.conv_op(inp, filter) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py”, line 591, in call return self.call(inp, filter) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py”, line 208, in call name=self.name) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py”, line 1578, in conv2d name=name) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_nn_ops.py”, line 1040, in conv2d data_format=data_format, dilations=dilations, name=name) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py”, line 788, in _apply_op_helper op_def=op_def) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py”, line 501, in new_func return func(*args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py”, line 3300, in create_op op_def=op_def) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py”, line 1801, in init self._traceback = tf_stack.extract_stack()
Any ideas somebody? I am just before reinstalling my complete environement 😦
It is legit a memory error, if using tf.keras then do the following at the top of your file config = tf.ConfigProto() config.gpu_options.allow_growth = True tf.keras.backend.set_session(tf.Session(config=config))
I think we can stop posting the
allow_growth
fix now 😃Hello @bm777
following my investigation from a few month ago I summarize how I understand the problem
The problem is not the system memory, the problem is the GPU memory!
works because it does not use the GPU!
A few explanations:
TF has two modes of operation:
allow memory growth = false
: In this case TF preallocates some memory for the system libraries using a rough guess ofhow much memory is needed. AS you can read here https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-633953715 TF uses the formula
max(300MB, GPU-MEM * fac)
for this guess. For TF2.1fac = 0.05
for TF2.2 and if I remember right it isfac=0.07
. So now you have 8GB which gives 400MB for GPU pre-allocated memory under TF2.1 and 560MB under TF2.2.I have experimentally evaluated the necessary pre-allocated memory for a few GPUs and TF21 here: https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-637715002 and here https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-637715002
Turns out for Conv2D operations I needed 520MB there, you would have less than that under TF21 but more under TF22. Unfortunately you don’t mention your TF version but I assume you use TF2.1. If you use TF2.2 and it still fails this might be because you use a different GPU. Anyway fact is it fails. See below
allow memory growth = true
: TF does not use any pre-allocated memory and loads the libraries as they come. In the TF documentation this is declared as problematic due to potential memory fragmentation and is thereforeoff
by default.My take:
Given the large range of required memory for the libraries that depends on the operations you perform as well on the GPU you have it seems very difficult to get mode
allow memory growth = false
right (see https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-637950411). The current solution: to increase the size of the pre-allocated memory, which was done for TF2.2, is problematic if your GPU is rather small. This blocks memory from use assuming you will need all available libraries (blas, Conv, FFT and I don’t know whether there are others). In the case where you don’t use all of these, this will result in wasting pre-allocated memory, in turn reducing the modelsize you may load for your application. On the other hand I believe that the memory fragmentation problem can be prevented when you create models early forcing system libraries to load before starting the training. This seems what is happening in most cases anyway and it seems therefore beneficial, especially for GPUs with small memory and especially for training a single model, to not pre-allocate but to useallow memory growth = true
.Personally I use GPUs with memory ranging from 4GB to 11GB and following the argument above I have set TF_FORCE_GPU_ALLOW_GROWTH=true for all of them. For the moment I did not have any problems with that.
As explained here, the new approach in TF 2.0 for setting
config.gpu_options.allow_growth = True
is:With this code snippet and TF 2.0 RC1, the error no longer appears. However, due to the number of people that have a 20XX Nvidia GPU, I think that it would be a good idea to address this problem natively before the final version of TF 2.0 is released.
Is blanket allow growth a solution ?
It is turned off by default for a reason see https://www.tensorflow.org/guide/using_gpu#allowing_gpu_memory_growth
In my program memory management is important
I would like to limit the amount of GPU used by TF because in my graphics application the GPU memory will be used for other things and putting it into a limited space is important to prevent out of memory errors
same issue, with gpu_options.allow_growth = True the issue fixed.
I’ve also faced such a problem, which was solved by adding an environment variable TF_FORCE_GPU_ALLOW_GROWTH=true.
The configuration is the following: Windows 10 Tensorflow compiled from source r2.0 Bazel: 0.26.1 C++ compiler: MSVC 2017 CUDA: 10 cuDNN: 7.6.5
Same problem here.
And as others have reported, setting allow_growth=TRUE allows things to run.
I got the same problem on Ubuntu 20.04 with a GeForce RTX 2060 SUPER. A NN with dense layers works well. But with CNN layers I’m getting
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
Addingtf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True)
makes no difference to the error. I followed the installation according to https://www.tensorflow.org/install/gpu andnvidia-smi
shows:Driver Version: 440.64.00 CUDA Version: 10.2
My conda env has:In a conda env with tf 1.15 I am getting the same error. It would be great if this could be fixed.
Update
After using
export TF_FORCE_GPU_ALLOW_GROWTH=true
it all works. I was of the impression that thetf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True)
would to the same thing, but that’s not the case. I think this should be clearly stated on the TensorFlow GPU support webpage.This one works! Thank you guys!
So you can do the patch without touching the code just by altering your runtime environment.
Another way to enable this option is to set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true. This configuration is platform specific.
Interestingly, during my struggles, I got a message from a red ‘no entry’ sign in my menubar that said ‘error broken count you have unmet dependenceis’ I ran software update and it wants to remove libcudnn7-dev and libcudnn7-doc as well as upgrade 57 other libraries having to do with linux
EDIT: After reboot the model seems to train successfully using this:
or this:
memory utilization on the gpu is <700 MB with batch size 16 and ~1 gigabyte with batch size 256 (which trains 3x faster)
Did you insert:
at the top of your entry code?
the code that worked for me:
import tensorflow as tf config = tf.compat.v1.ConfigProto() config.gpu_options.allow_growth = True session = tf.compat.v1.InteractiveSession(config=config)
Hello everyone! I have solved similar problem with limiting memory growth and you can try.
You can find code in section Limit memory growth
(This is my first comment in GitHub)
when you use tensorflow 2.0 , you can use
tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True)
this code is afterimport tensorflow as tf
but before your code.For anyone else finding this after upgrading to tensorflow 2.0, the API and the code are slightly different.
Ubuntu 18 Tensorflow 2.0 Tensorflow-gpu 2.0 GeForce RTX 2070
Updated code for this system.
I also meet this problem anacondacloud install tensorflow-gpu2.0
rtx2070s tensorflow-gpu.2.0.0 cuda 10.0.13 cudnn 7.6.5 Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
I am working in C++ under Windows
Adding the allow growth option results in an OOM error.
Without this line of code the model runs fine on the same machine with the same card.
With OOM error
Without OOM error
So to solve this problem with set allow growth results in a segfault.
Just upgrade to Tensorflow 2.3 with CUDA 11 and cudnn 8.0. It magically solved all my problems and I don’t even need the workaround with
config.gpu_options.allow_growth = True
now.This problem seems related with my RTX2080, I have a desktop GTX1080, everything seems ok, then i use conda clone the conda enviroment to my RTX2080 notebook, I use tensorflow2.0.0-gpu . once application code use Conv2d, LSTM, GRU then this trouble come. before I use the following codes to solve this problem: gpus = tf.config.experimental.list_physical_devices(‘GPU’) if gpus: try:
Currently, memory growth needs to be the same across GPUs
Memory growth must be set before GPUs have been initialized
but since several days ago, the above method does not work any more
I’m having the same issue as @clementpoiret with TF 2.0 installed via conda. By using the
allow_growth
flag the issue disappears but that also makes the training very very slow, slower than what I had on TF 1.x… Eager first uh?I think I found a better workaround than the
config.gpu_options.allow_growth = True
.For my setup (RTX 2070, docker image tensorflow:1.15.0-gpu-py3), setting config as shown below avoids the CUDNN_STATUS_INTERNAL_ERROR while still allocating the whole GPU memory. This is very useful for large models that would not fit into memory in
allow_growth
mode but just fits when the whole memory is allocated.To allocate the whole memory on RTX:
config.gpu_options.per_process_gpu_memory_fraction = 1.0
This solution worked for me. (TF-GPU 2.0, Windows 10, GeForce RTX 2070)
Same issue with RTX 2070
RTX 2070 here. Was getting this error, but now running with
TF_FORCE_GPU_ALLOW_GROWTH=true
(as other commenters have pointed out, fixes it for them) changes the error message to an out of memory error (even though I’ve got plenty of memory):But my GPU has 8GB and only about 250MB were in use before I started the process. So I don’t understand, why can’t it allocate 3.87GB? (lowering batch size had no effect; the weights hdf5 file is less than 200MB)
I had this same issue with RTX 2080. Then following code worked for me.
Thanks everyone
I Fixed it with this:
Is there a fix for this issue with tensorflow 2 and python3 ???
I have a: RTX 2080
I am getting this message:
Hello @roebel
Me too, I was thinking about the issues of error of allocation of memory. This is clearly for me now. Now it looks good GPU memory
In the past, I tested many options to pre-allocate memory 😢:
Personally I use GPU with 6GB of memory. And thank you @roebel, for this new arrow
TF_FORCE_GPU_ALLOW_GROWTH=true
to force my GPU for allocation 😊.I had a similar issue before. limiting GPU memory manually helped. https://github.com/tensorflow/tensorflow/issues/25160#issuecomment-643703167
Just wanted to chime in and say that the problem is still there;
My specs: Ubuntu 20.04 NVIDIA RTX 2070 Nvidia_driver 440.64 Tensorflow-gpu 2.0.1 (Installed through conda, which automatically installs Cudatoolkit and CuDNN in same env) cudatoolkit 10.1.243 cudnn 7.6.5
Problem is solved by
tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True)
However this seems more like a work-around than an actual fix, and a lot of people have 20XX cards these days. Probably there should be an update in which this issue is addressed.
Update: Since I’m dual-booting, I tried to check for windows as well. Problem persists there. Windows 10 Nvidia-driver 445.87 Other than that everything is similar
can confirm that building from source with changing the magic number
0.05
magic number to0.1
seems to fix the issue (at least for 1.15.2)!I also get this error working in the tensorflow 1.15.0-py3-gpu Docker image (Ubuntu 18.04) with two Titan V GPUs (@sanjoy) - not RTXs. However, this error only seems to occur for me on my GPU0 which has Xorg and gnome-shell using GPU0 memory while GPU1 only has python using GPU mem and does not throw this error. The error is also unfortunately intermittent – sometimes I will be able to remove the docker container, recreate it with the same settings and same code, then then the error will go away. Or not.
I was able to fix it using the Keras backend interface with:
Following is my nvidia-smi on both GPUs
Same issue with an RTX2080, spent two days recompiling and bug hunting until I found this fix. (the allow_growth=true thing fixed it)
You made my day
unfortunately, I need to run code that only supports tensorflow 1.X
Probably one should do this only if allow memory growth is off. Otherwise you will always need about 580MB for the 2080 even if you don’t need all the operators.
I made a few more test concerning the minimum system memory requirements for running combinations of the three operations from my test case. I compare only the 1080 and 2080 cards. You dont find conv2d alone because it initializes blas in any case. Out comes
One can see that on the 2080 cuda requires an overhead for each operation, and that this overhead increases when using more libraries. In most cases the overhead is
<100MB
but it becomes>220MB
once Conv2D is involved…If @samhodge has contact to NVIDIA I would personnally find it interesting to hear whether this is intended.
@roebel
I have struggled with this in my C++ application for a number of iterations.
What is came down to in the end was the following.
Only run models on the GPU when enough memory is available to run the model.
So the amount of memory that the model will require is quantifiable.
So you need to have a GPU memory as a percentage which will fit that model.
Then you also need to know about how much memory is available on the card exactly before allocating the memory, which is subject to race conditions, because you don’t know what else is using CUDA memory at the same time on the operating system.
But the race condition aside, you also need to measure the memory free.
This is done by using
cudaMemInfo
, which in itself uses memory.So on the provision that you have enough memory to run
cudaMemInfo
once to measure and you need to make sure that enough memory is free to fit the model and runcudaMemInfo
one more time, then and only then you can allocate enough of the percentage of available VRAM on that card for running the model.Anyway the take home from my random babbling is that
cudaMemInfo
is required to poll the amount of memory available to allocate which in itself also uses some of that available memory.Maybe somehow the amount of memory used by
cudaMemInfo
is different on a Turing based card compared at a Pascal based card, I can get someone from NVIDIA to have a look if you wish.@samhodge @sanjoy @odinsbane
Finally I have been able to run the patched library on the rtx 2080 cards. As expected the patched version does not pass. Here again the script
And here the matrix of
available memory
reported from gpu_device.cc, default value ofMin_system_memory
as selected in gpu_device.cc and themin value of the min_system_memory
I need to select for the script to not abort:So while 1050 and 1080 run the script with about the same memory size the RTX2080 requires nearly twice as much memory. This does not sound good to me.
Any suggestions what to try to get this to a comparable value?
@roebel I did not recall what triggered the problem for you.
see this https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-480549043
Which is why I thought it was memory related, this issue has not effected me for some time, nor the users of my software on a variety of platforms.
OS: ubuntu 18.04 lts
Driver Version: 435.21
CUDA: cudatoolkit 10.1
CUDNN: cudnn-7.6.5-cuda10.1_0
I used anaconda install tensorflow
the cudatoolkit and cudnn are auto-install by anaconda through the command before.
I have the same question, The error:
This code is shared to make it faster available for both tensorflow and keras users. source from here
I got the same problem with following configuration: TensorFlow installed from (source or binary): r1.13.1,r.1.13.2,r1.14 Python version: 3.6.1 Bazel version (if compiling from source): GCC/Compiler version (if compiling from source): CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1 GPU model and memory: RTX 2070 8GB.
I sovled this problem with: TensorFlow installed from (source or binary): r1.12.0 Python version: 3.6.9 GCC/Compiler version: 4.8 CUDA/cuDNN version: CUDA 9.0 with cuDNN 7.1.4 GPU model and memory: RTX 2070 8GB. Hope helpful to you
I had the same problem and
allow_growth = True
was the solution. BUT, for TensorFlow 2, in order to do that you need to add the following lines:gpu_devices = tf.config.experimental.list_physical_devices('GPU') for device in gpu_devices: tf.config.experimental.set_memory_growth(device, True)
Thanks to user @opcecco in this issue: https://github.com/tensorflow/tensorflow/issues/25446
yeah,I solved this problem like this way.Thanks!!
We are facing relevant issues
System specifications
The error is triggered when I try to use LSTM, GRU, RNN etc.
Actual error
2019-12-23 16:09:00.912238: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-12-23 16:09:01.408990: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-12-23 16:09:01.409043: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cudnn_rnn_ops.cc:1491 : Unknown: Fail to find the dnn implementation.
File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/recurrent_v2.py", line 961, in call **cudnn_lstm_kwargs) File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/recurrent_v2.py", line 1174, in cudnn_lstm rnn_mode='lstm') File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 109, in cudnn_rnn ctx=_ctx) File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 198, in cudnn_rnn_eager_fallback attrs=_attrs, ctx=_ctx, name=name) File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation. [Op:CudnnRNN]
Apparent problem
As it seems all my memory is eaten out pretty fast. The problems seems to come up only in gpu mode, the same code works fine with cpu
Trials
Both tries produce the same error.
Any ideas?
Hi,
I cannot reproduce this on my machine so I’ll need some help root-causing this. Do we have someone here who can reproduce the problem and is willing to do some hands-on debugging?
As a starting point I’d like to understand why
MinSystemMemory
does not preserve enough memory for cuDNN. If someone with a setup that reproduces this issue can add some logging (as a local patch) to discover out the amount of memory returned byMinSystemMemory
that would be great. And does increasing the magic0.05
number inMinSystemMemory
help the situation?@clementpoiret: Please note that the
tf.config.experimental.set_memory_growth
call is unnecessary sincetf.config.experimental.set_virtual_device_configuration
overrides that flag since it slices up the GPU memory and pre-allocates the allocated memory.@synapse8 I don’t see something equivalent in tensorflow 2.0’s documentation, any way to do so with tf.config.experimental ?
Edit: I’m gonna try to set memory this way, to see if it’s solving the issue:
This way we can conveniently just call
setup_gpus(True, .9)
I tried using that for tensorflow 2.0:
It fixes cudnn error on my rtx2080, but the training is as fast as my 1050Ti on my laptop! While training a CNN:
Adding
Didn’t solve the issue, without allow_growth I’m getting the cudnn error, and anyway my RTX is only using something like 3Gb or memory.
Any idea ?
I tried
but cudnn is still throwing an error
I tested building tf-2.0.0-beta1 from sources with CUDA-10.1 and CUDNN-7.6.2.4 and the error doesn’t manifest.
You can find docker images for building a tf-gpu package and a tf-base package here: https://github.com/edowson/docker-tensorflow
The anaconda channel doesn’t have
cudnn==7.6.2
at the time of writing this comment.@Hayashi-Yudai
What were the exact commands you added to your code? Try the following instead if it’s different …
config = tf.ConfigProto() config.gpu_options.allow_growth = True tf.keras.backend.set_session(tf.Session(config=config))
@robzor92 I doubt the 1050Ti’s problem is with the small VRAM size. The RTX cards would encounter this on the basic CNN MNIST models. I doubt it’s NVIDIA’s tweaking of VRAM allocation on RTX cards somehow messed things up.
I ran into this issue as well, and was able to solve it by using @va-andrew 's solution, and specifically, I used @colinsteidtmann 's implementation, since I use some of the tensorflow.keras functions in my code. I spent a long time trying to debug this problem, so thank you both for your contributions.
EDIT: I was just looking at tensorflow documentation (https://www.tensorflow.org/guide/using_gpu), and you can also tell it to allow memory growth by setting the environment variable TF_FORCE_GPU_ALLOW_GROWTH to true. It also says that this configuration is platform specific, so YMMV (works for me with Ubuntu 18.04).
For reference, I am running: Ubuntu 18.04.2 LTS, Gigabyte GeForce RTX 2080 Turbo, NVIDIA driver 430.26, CUDA 10.0.130, cuDNN 7.4.2.24, tensorflow-gpu 1.13.1, python 3.6. I run tensorflow from within a virtual environment, using spyder 3.3.4.
I have a 2nd computer with the exact same hardware, and I set it up following the same set of instructions, used the same files to do the install, and had this issue on that machine as well. No surprise there.
I have a 3rd computer with the exact same hardware, except that it has a 2080 Ti instead of the 2080, and I set it up following the same set of instructions, and again used the same files to do the install. But this time, there was no issue.
So, I’m led to believe it’s not related to some conflict of CUDA, cuDNN, and driver version; it’s not an incorrectly done installation, etc. Rather, it’s related to the model of video card; I’ve only seen mention of this issue with RTX 2060, 2070, and 2080.
Fortunately, it’s not a big inconvenience to use the workaround.
The descriptions of the problems you are seeing makes me believe that (particular version of) cuDNN tries to allocate GPU memory when creating the handle. If TensorFlow already took all the memory (either because config.gpu_options.allow_growth = false, or per_process_gpu_memory_fraction close to 1.0) there is no memory left to allocate for cuDNN.
You could confirm this by running TensorFlow through nvprof and generate an API trace to inspect the failing cuMemAlloc call.
Issue #6698 seems to discuss the same problem. Some people noticed that they had accidentally used a cuDNN release that doesn’t match their CUDA version. Could you please verify that you are using cuDNN for CUDA 10 when running with CUDA 10?
Still having the same issue here but “config.gpu_options.allow_growth = True” doesn’t fix the problem. Happens on both TF-gpu 1.14.1 and TF-gpu 2.0. RTX1070, CUDA 10.0, Ubuntu 18.04, Nvidia driver 430.09.
Running into the same issue on a GTX 1050 using tensorflow-gpu 1.13.1 from pip with CUDA 10.0/cuDNN 7.4.2.24/Nvidia driver 410/Ubuntu 16.04.