tensorflow: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes and No (described below)
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): tf-nightly-gpu (Dec 19, r1.13)
  • TensorFlow version (use command below): 1.13.0-dev20181219
  • Python version: 3.7.1
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1
  • GPU model and memory: RTX 2070 8GB

Describe the current behavior I’m running the CNN model on MNIST. When I’m running with the GPU, I am encountering 2018-12-20 20:09:13.644176: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I did some digging and realized that it is a memory issue (which shouldn’t be the case as I have 32GB of RAM and 64GB of swap. I ran htop when running the model and I have 20+GB free, which is more than enough to fit the 8GB vRAM mappings.

Using the gpu_options.allow_growth = True gets the model to work properly, and setting os.environ['CUDA_VISIBLE_DEVICES'] = '-1' also works. This means that I AM facing a memory issue, but I don’t see how.

Also, using gpu_options.allow_growth = True does not fix the same issue when trying to run tensorflow/models/official/mnist/ model, which should have a similar behavior with my code.

Code to reproduce the issue

import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import math
import time
# Killing optional CPU driver warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
tf.logging.set_verbosity(tf.logging.ERROR)


class Model:

    def __init__(self, image, label):
        """
        A Model class contains a computational graph that classifies images
        to predictions. Each of its methods builds part of the graph
        on Model initialization. Do not modify the constructor, as doing so
        would break the autograder. You may, however, add class variables
        to use in your graph-building. e.g. learning rate, 

        image: the input image to the computational graph as a tensor
        label: the correct label of an image as a tensor
        prediction: the output prediction of the computational graph,
                    produced by self.forward_pass()
        optimize: the model's optimizing tensor produced by self.optimizer()
        loss: the model's loss produced by computing self.loss_function()
        accuracy: the model's prediction accuracy
        """
        self.image = image
        self.label = label

        # TO-DO: Add any class variables you want to use.

        self.prediction = self.forward_pass()
        self.loss = self.loss_function()
        self.optimize = self.optimizer()
        self.accuracy = self.accuracy_function()

    def forward_pass(self):
        """
        Predicts a label given an image using convolution layers

        :return: the prediction as a tensor
        """
        filter_1 = tf.Variable(tf.truncated_normal([3, 3, 1, 8], stddev=0.1))
        conv_1 = tf.nn.conv2d(self.image, filter_1, [1, 1, 1, 1], "SAME")

        reshaped = tf.reshape(conv_1, shape=[50, -1])

        L1 = reshaped.shape[1].value
        L2 = 500
        W1 = tf.Variable(tf.random_normal([L1, L2], mean=0, stddev=0.01))
        b1 = tf.Variable(tf.random_normal([L2], mean=0, stddev=0.01))
        relu_1 = tf.nn.relu(tf.matmul(reshaped, W1) + b1)

        W2 = tf.Variable(tf.random_normal([L2, 10], mean=0, stddev=0.01))
        b2 = tf.Variable(tf.random_normal([10], mean=0, stddev=0.01))
        logits = tf.nn.relu(tf.matmul(relu_1, W2) + b2)
        return logits

    def loss_function(self):
        """
        Calculates the model cross-entropy loss

        :return: the loss of the model as a tensor
        """
        loss = tf.losses.softmax_cross_entropy(onehot_labels=self.label, logits=self.prediction)
        return loss

    def optimizer(self):
        """
        Optimizes the model loss using an Adam Optimizer

        :return: the optimizer as a tensor
        """
        learning_rate = 0.1
        sgd = tf.train.GradientDescentOptimizer(learning_rate)
        train = sgd.minimize(self.loss)
        return train

    def accuracy_function(self):
        """
        Calculates the model's prediction accuracy by comparing
        predictions to correct labels – no need to modify this

        :return: the accuracy of the model as a tensor
        """
        correct_prediction = tf.equal(tf.argmax(self.prediction, 1),
                                      tf.argmax(self.label, 1))
        return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


def main():
    t_start = time.time()

    mnist = input_data.read_data_sets("data/mnist/", one_hot=True)
    batch_sz = 50
    batch = 2000

    inputs = tf.placeholder(shape=[batch_sz, 28, 28, 1], dtype=tf.float32)
    labels = tf.placeholder(shape=[batch_sz, 10], dtype=tf.float32)

    model = Model(inputs, labels)

    session_config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
    sess = tf.Session(config=session_config)

    # sess = tf.Session()

    sess.run(tf.global_variables_initializer())
    for i in range(batch):
        next_image, next_label = mnist.train.next_batch(batch_sz)
        next_image = next_image.reshape((batch_sz, 28, 28, 1))
        sess.run(model.optimize, feed_dict={inputs: next_image, labels: next_label})

    acc, test_images, test_labels = 0, mnist.test.images, mnist.test.labels
    test_batch = math.ceil(len(test_images) / batch_sz)
    for i in range(test_batch):
        batch_images = test_images[i * batch_sz: (i + 1) * batch_sz]
        batch_images = batch_images.reshape((batch_sz, 28, 28, 1))
        batch_labes = test_labels[i * batch_sz: (i + 1) * batch_sz]
        acc += sess.run(model.accuracy, feed_dict={inputs: batch_images, labels: batch_labes})
    acc /= test_batch
    print(acc)

    print(time.time() - t_start, 'seconds')

    return


if __name__ == '__main__':
    main()

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 62
  • Comments: 186 (20 by maintainers)

Commits related to this issue

Most upvoted comments

I did try compiling from source, but ran into the same issue. I was finally able to fix my problem was setting config.gpu_options.allow_growth = True.

ok, made it work in tf-nightly-gpu-2.0-preview and ipython notebook adding this to my code:

from tensorflow.compat.v1 import ConfigProto from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto() config.gpu_options.allow_growth = True session = InteractiveSession(config=config)

I’ve been running into the same issue with the same GPU: “CUDNN_STATUS_INTERNAL_ERROR”.

RTX 2070 GPU CUDA 10 cuDNN 7.4.2 Ubuntu 18.04 tf-nightly-gpu (r1.13, Jan 13) Python 3.6.7

2019-01-15 05:01:03.503415: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA li
brary libcublas.so.10.0 locally
2019-01-15 05:01:03.752563: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA li
brary libcudnn.so.7 locally
2019-01-15 05:01:04.905618: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STAT
US_INTERNAL_ERROR
2019-01-15 05:01:04.908147: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STAT
US_INTERNAL_ERROR
2019-01-15 05:01:04.908191: W tensorflow/core/framework/op_kernel.cc:1412] OP_REQUIRES failed at conv_ops_fused.cc:801 :
 Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to se
e if a warning log message was printed above.

Try to compile r1.13 from source. It would take a long time, but it should fix your problem. At least it fixed mine.

@ymodak It looks like this issue was closed prematurely. While there is a work-around for this issue it involves changing application code. As a result the example code does not work out of the box on RTX cards and most recipes on line will also need modification.

@ymodak This bug is not fixed. Arguably, using any sort of convnet should work in the default configuration. Either allow_growth should be true by default, it should be fixed so this works, or there should be a better error than CUDNN_STATUS_INTERNAL_ERROR.

How do you actually set allow_growth=true? I have tf-nightly-gpu-2.0-preview and tried:

import tensorflow as tf config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config, …)

but get this error:

AttributeError Traceback (most recent call last) <ipython-input-14-b4f9929bf252> in <module>() 1 import tensorflow as tf ----> 2 config = tf.ConfigProto()

AttributeError: module ‘tensorflow’ has no attribute ‘ConfigProto’

How can I set allow_growth in tensorflow 2.0?

I got the same problem on Ubuntu 20.04 with a GeForce RTX 2060 SUPER. A NN with dense layers works well. But with CNN layers I’m getting Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. Adding tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True) makes no difference to the error. I followed the installation according to https://www.tensorflow.org/install/gpu and nvidia-smi shows: Driver Version: 440.64.00 CUDA Version: 10.2 My conda env has:

cudatoolkit               10.1.243             h6bb024c_0  
cudnn                     7.6.5                cuda10.1_0  
tensorflow-gpu            2.1.0                h0d30ee6_0

In a conda env with tf 1.15 I am getting the same error. It would be great if this could be fixed.

Update

After using export TF_FORCE_GPU_ALLOW_GROWTH=true it all works. I was of the impression that the tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True) would to the same thing, but that’s not the case. I think this should be clearly stated on the TensorFlow GPU support webpage.

Dude, your solution saves my life.

I’ve been having the same issue (on an RTX 2060, Ubuntu 18.04, Python 3.6.7, CUDA 10.0.130, cuDNN 7.4.2, Tensorflow 1.13.0-rc0 from source). Thanks to @va-andrew’s suggestion I have it working with the allow_growth option set.

FWIW, in the course of searching for solutions to this it seems that this issue is a common problem with the RTX series (although it might be a general problem with CUDA 10.0, since the new cards don’t support the older versions). It would be great if the defaults could get updated in the release of 1.13 so that special options don’t need to be set for these cards.

@ymodak Can you please reference the PR that fixed this bug?

I’ve the same problem running on

RTX2080 GPU CUDA 10 cudnn 7.4.2

I tried the following tf Versions tf-nightly-gpu and a self compiled Version from master (060b6e32ad). I found out that its possible to set the following ENVIRONMENT Variables to get further Debug Info.

CUDNN_LOGINFO_DBG=1; CUDNN_LOGDEST_DBG=stdout

Then i get the following error:

I0117 14:11:24.441819 140433563125568 basic_session_run_hooks.py:594] Saving checkpoints for 0 into /tmp/mnist/model.ckpt. 2019-01-17 14:11:25.916269: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA library libcublas.so.10.0 locally

I! CuDNN (v7402) function cudnnCreate() called: i! Time: 2019-01-17T14:11:26.079184 (0d+0h+0m+0s since start) i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.

2019-01-17 14:11:26.079151: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA library libcudnn.so.7 locally

I! CuDNN (v7402) function cudnnCreate() called: i! Time: 2019-01-17T14:11:26.571897 (0d+0h+0m+0s since start) i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.

2019-01-17 14:11:26.571858: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-01-17 14:11:26.579375: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I! CuDNN (v7402) function cudnnCreate() called: i! Time: 2019-01-17T14:11:26.579803 (0d+0h+0m+0s since start) i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.

2019-01-17 14:11:26.585818: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-01-17 14:11:26.585850: W ./tensorflow/stream_executor/stream.h:2109] attempting to perform DNN operation using StreamExecutor without DNN support Traceback (most recent call last): File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1335, in _do_call return fn(*args) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1320, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1408, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node Discriminator_1/Conv/Conv2D}}]] [[train/discriminator_train/train_op/control_dependency/_569]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/home/dj/projects/gan/tf_models/research/gan/mnist/train.py”, line 151, in <module> tf.app.run() File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py”, line 125, in run _sys.exit(main(argv)) File “/home/dj/projects/gan/tf_models/research/gan/mnist/train.py”, line 147, in main get_hooks_fn=tfgan.get_joint_train_hooks()) File “/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py”, line 1200, in gan_train config=config) File “/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/training/python/training/training.py”, line 546, in train loss = session.run(train_op, run_metadata=run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 693, in run run_metadata=run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1188, in run run_metadata=run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1287, in run raise six.reraise(*original_exc_info) File “/usr/local/lib/python3.6/dist-packages/six.py”, line 693, in reraise raise value File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1272, in run return self._sess.run(*args, **kwargs) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1336, in run feed_dict, options) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py”, line 1362, in _call_hook_before_run request = hook.before_run(run_context) File “/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py”, line 1061, in before_run run_context.session.run(self._train_ops) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 930, in run run_metadata_ptr) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1153, in _run feed_dict_tensor, options, run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1329, in _do_run run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1349, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node Discriminator_1/Conv/Conv2D (defined at home/dj/projects/gan/tf_models/research/gan/mnist/networks.py:152) ]] [[train/discriminator_train/train_op/control_dependency/_569]]

Errors may have originated from an input operation. Input Source operations connected to node Discriminator_1/Conv/Conv2D: inputs/batch/n (defined at home/dj/projects/gan/tf_models/research/gan/mnist/data_provider.py:67)

Original stack trace for ‘Discriminator_1/Conv/Conv2D’: File “home/dj/projects/gan/tf_models/research/gan/mnist/train.py”, line 151, in <module> tf.app.run() File “usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py”, line 125, in run _sys.exit(main(argv)) File “home/dj/projects/gan/tf_models/research/gan/mnist/train.py”, line 87, in main [FLAGS.batch_size, FLAGS.noise_dims])) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py”, line 118, in gan_model discriminator_real_outputs = discriminator_fn(real_data, generator_inputs) File “home/dj/projects/gan/tf_models/research/gan/mnist/networks.py”, line 176, in unconditional_discriminator net = _discriminator_helper(img, False, None, weight_decay) File “home/dj/projects/gan/tf_models/research/gan/mnist/networks.py”, line 152, in _discriminator_helper net = layers.conv2d(img, 64, [4, 4], stride=2) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py”, line 182, in func_with_args return func(*args, **current_args) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/layers/python/layers/layers.py”, line 1155, in convolution2d conv_dims=2) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py”, line 182, in func_with_args return func(*args, **current_args) File “usr/local/lib/python3.6/dist-packages/tensorflow/contrib/layers/python/layers/layers.py”, line 1058, in convolution outputs = layer.apply(inputs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py”, line 1228, in apply return self.call(inputs, *args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/base.py”, line 531, in call outputs = super(Layer, self).call(inputs, *args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py”, line 564, in call outputs = self.call(inputs, *args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py”, line 196, in call outputs = self._convolution_op(inputs, self.kernel) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py”, line 966, in call return self.conv_op(inp, filter) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py”, line 591, in call return self.call(inp, filter) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py”, line 208, in call name=self.name) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py”, line 1578, in conv2d name=name) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_nn_ops.py”, line 1040, in conv2d data_format=data_format, dilations=dilations, name=name) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py”, line 788, in _apply_op_helper op_def=op_def) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py”, line 501, in new_func return func(*args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py”, line 3300, in create_op op_def=op_def) File “usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py”, line 1801, in init self._traceback = tf_stack.extract_stack()

Any ideas somebody? I am just before reinstalling my complete environement 😦

It is legit a memory error, if using tf.keras then do the following at the top of your file config = tf.ConfigProto() config.gpu_options.allow_growth = True tf.keras.backend.set_session(tf.Session(config=config))

I think we can stop posting the allow_growth fix now 😃

Hello @bm777

following my investigation from a few month ago I summarize how I understand the problem

GPU model and memory: RTX 2070 8GB … which shouldn’t be the case as I have 32GB of RAM and 64GB of

The problem is not the system memory, the problem is the GPU memory!

os.environ[‘CUDA_VISIBLE_DEVICES’] = ‘-1’

works because it does not use the GPU!

A few explanations:

TF has two modes of operation:

  1. allow memory growth = false: In this case TF preallocates some memory for the system libraries using a rough guess of
    how much memory is needed. AS you can read here https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-633953715 TF uses the formula max(300MB, GPU-MEM * fac) for this guess. For TF2.1 fac = 0.05 for TF2.2 and if I remember right it is fac=0.07. So now you have 8GB which gives 400MB for GPU pre-allocated memory under TF2.1 and 560MB under TF2.2.

    I have experimentally evaluated the necessary pre-allocated memory for a few GPUs and TF21 here: https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-637715002 and here https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-637715002

    Turns out for Conv2D operations I needed 520MB there, you would have less than that under TF21 but more under TF22. Unfortunately you don’t mention your TF version but I assume you use TF2.1. If you use TF2.2 and it still fails this might be because you use a different GPU. Anyway fact is it fails. See below

  1. allow memory growth = true: TF does not use any pre-allocated memory and loads the libraries as they come. In the TF documentation this is declared as problematic due to potential memory fragmentation and is therefore off by default.

My take:

Given the large range of required memory for the libraries that depends on the operations you perform as well on the GPU you have it seems very difficult to get mode allow memory growth = false right (see https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-637950411). The current solution: to increase the size of the pre-allocated memory, which was done for TF2.2, is problematic if your GPU is rather small. This blocks memory from use assuming you will need all available libraries (blas, Conv, FFT and I don’t know whether there are others). In the case where you don’t use all of these, this will result in wasting pre-allocated memory, in turn reducing the modelsize you may load for your application. On the other hand I believe that the memory fragmentation problem can be prevented when you create models early forcing system libraries to load before starting the training. This seems what is happening in most cases anyway and it seems therefore beneficial, especially for GPUs with small memory and especially for training a single model, to not pre-allocate but to use allow memory growth = true.

Personally I use GPUs with memory ranging from 4GB to 11GB and following the argument above I have set TF_FORCE_GPU_ALLOW_GROWTH=true for all of them. For the moment I did not have any problems with that.

As explained here, the new approach in TF 2.0 for setting config.gpu_options.allow_growth = True is:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Currently, memory growth needs to be the same across GPUs
  try:
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
  except RuntimeError as e:
    print(e)

With this code snippet and TF 2.0 RC1, the error no longer appears. However, due to the number of people that have a 20XX Nvidia GPU, I think that it would be a good idea to address this problem natively before the final version of TF 2.0 is released.

Is blanket allow growth a solution ?

It is turned off by default for a reason see https://www.tensorflow.org/guide/using_gpu#allowing_gpu_memory_growth

In my program memory management is important

I would like to limit the amount of GPU used by TF because in my graphics application the GPU memory will be used for other things and putting it into a limited space is important to prevent out of memory errors

same issue, with gpu_options.allow_growth = True the issue fixed.

I’ve also faced such a problem, which was solved by adding an environment variable TF_FORCE_GPU_ALLOW_GROWTH=true.

The configuration is the following: Windows 10 Tensorflow compiled from source r2.0 Bazel: 0.26.1 C++ compiler: MSVC 2017 CUDA: 10 cuDNN: 7.6.5

Same problem here.

  • RTX 2070
  • Ubuntu 18.04
  • CudNN 7.4.2 (but I have tried compiling with other older versions with no luck)
  • Tensorflow 1.13.0-dev20190125 (also tried Tensorflow 1.12 compiled with Cuda 10)

And as others have reported, setting allow_growth=TRUE allows things to run.

I got the same problem on Ubuntu 20.04 with a GeForce RTX 2060 SUPER. A NN with dense layers works well. But with CNN layers I’m getting Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. Adding tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True) makes no difference to the error. I followed the installation according to https://www.tensorflow.org/install/gpu and nvidia-smi shows: Driver Version: 440.64.00 CUDA Version: 10.2 My conda env has:

cudatoolkit               10.1.243             h6bb024c_0  
cudnn                     7.6.5                cuda10.1_0  
tensorflow-gpu            2.1.0                h0d30ee6_0

In a conda env with tf 1.15 I am getting the same error. It would be great if this could be fixed.

Update

After using export TF_FORCE_GPU_ALLOW_GROWTH=true it all works. I was of the impression that the tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True) would to the same thing, but that’s not the case. I think this should be clearly stated on the TensorFlow GPU support webpage.

This one works! Thank you guys!

from keras.backend.tensorflow_backend import set_session
$ import tensorflow as tf
$ config = tf.ConfigProto()
$ config.gpu_options.allow_growth = True
$ config.log_device_placement = True
$ sess = tf.Session(config=config)
$ set_session(sess)

So you can do the patch without touching the code just by altering your runtime environment.

Another way to enable this option is to set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true. This configuration is platform specific.

Interestingly, during my struggles, I got a message from a red ‘no entry’ sign in my menubar that said ‘error broken count you have unmet dependenceis’ I ran software update and it wants to remove libcudnn7-dev and libcudnn7-doc as well as upgrade 57 other libraries having to do with linux

EDIT: After reboot the model seems to train successfully using this:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

or this:

import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices('GPU')
for device in gpu_devices: tf.config.experimental.set_memory_growth(device, True)

memory utilization on the gpu is <700 MB with batch size 16 and ~1 gigabyte with batch size 256 (which trains 3x faster)

I also meet this problem anacondacloud install tensorflow-gpu2.0

rtx2070s tensorflow-gpu.2.0.0 cuda 10.0.13 cudnn 7.6.5 Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

Did you insert:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
    print(e)

at the top of your entry code?

the code that worked for me:

import tensorflow as tf config = tf.compat.v1.ConfigProto() config.gpu_options.allow_growth = True session = tf.compat.v1.InteractiveSession(config=config)

Hello everyone! I have solved similar problem with limiting memory growth and you can try.

You can find code in section Limit memory growth

(This is my first comment in GitHub)

config.gpu_options.allow_growth = True

when you use tensorflow 2.0 , you can use tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True) this code is after import tensorflow as tf but before your code.

For anyone else finding this after upgrading to tensorflow 2.0, the API and the code are slightly different.

Ubuntu 18 Tensorflow 2.0 Tensorflow-gpu 2.0 GeForce RTX 2070

Updated code for this system.

import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

I also meet this problem anacondacloud install tensorflow-gpu2.0

rtx2070s tensorflow-gpu.2.0.0 cuda 10.0.13 cudnn 7.6.5 Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

I am working in C++ under Windows

Adding the allow growth option results in an OOM error.

Without this line of code the model runs fine on the same machine with the same card.

With OOM error

options.config.mutable_gpu_options()->set_allow_growth(true);
options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(fraction);

Without OOM error

//options.config.mutable_gpu_options()->set_allow_growth(true);
options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(fraction);

So to solve this problem with set allow growth results in a segfault.

Just upgrade to Tensorflow 2.3 with CUDA 11 and cudnn 8.0. It magically solved all my problems and I don’t even need the workaround with config.gpu_options.allow_growth = True now.

This problem seems related with my RTX2080, I have a desktop GTX1080, everything seems ok, then i use conda clone the conda enviroment to my RTX2080 notebook, I use tensorflow2.0.0-gpu . once application code use Conv2d, LSTM, GRU then this trouble come. before I use the following codes to solve this problem: gpus = tf.config.experimental.list_physical_devices(‘GPU’) if gpus: try:

Currently, memory growth needs to be the same across GPUs

    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:

Memory growth must be set before GPUs have been initialized

    print(e)

but since several days ago, the above method does not work any more

I’m having the same issue as @clementpoiret with TF 2.0 installed via conda. By using the allow_growth flag the issue disappears but that also makes the training very very slow, slower than what I had on TF 1.x… Eager first uh?

I think I found a better workaround than the config.gpu_options.allow_growth = True.

For my setup (RTX 2070, docker image tensorflow:1.15.0-gpu-py3), setting config as shown below avoids the CUDNN_STATUS_INTERNAL_ERROR while still allocating the whole GPU memory. This is very useful for large models that would not fit into memory in allow_growth mode but just fits when the whole memory is allocated.

To allocate the whole memory on RTX: config.gpu_options.per_process_gpu_memory_fraction = 1.0

This solution worked for me. (TF-GPU 2.0, Windows 10, GeForce RTX 2070)

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
tf.config.experimental.set_memory_growth(physical_devices[0], True)

Same issue with RTX 2070

RTX 2070 here. Was getting this error, but now running with TF_FORCE_GPU_ALLOW_GROWTH=true (as other commenters have pointed out, fixes it for them) changes the error message to an out of memory error (even though I’ve got plenty of memory):

2020-10-17 16:35:11.717658: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 3.87G (4159818752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

But my GPU has 8GB and only about 250MB were in use before I started the process. So I don’t understand, why can’t it allocate 3.87GB? (lowering batch size had no effect; the weights hdf5 file is less than 200MB)

I had this same issue with RTX 2080. Then following code worked for me.

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

Thanks everyone

I Fixed it with this:

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
sess.as_default()

Is there a fix for this issue with tensorflow 2 and python3 ???

I have a: RTX 2080

I am getting this message:


2020-08-20 12:38:27.172496: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-08-20 12:38:27.177708: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "/home/anantha/Desktop/RaspiCar/car.py", line 85, in <module>
    tnet.train(x, y)
  File "/home/anantha/Desktop/RaspiCar/car.py", line 65, in train
    self.model.fit(x, y, epochs=epochs)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 848, in fit
    tmp_logs = train_function(iterator)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node sequential/conv2d/Conv2D (defined at /Desktop/RaspiCar/car.py:65) ]] [Op:__inference_train_function_951]

Function call stack:
train_function

Hello @roebel

Me too, I was thinking about the issues of error of allocation of memory. This is clearly for me now. Now it looks good GPU memory

In the past, I tested many options to pre-allocate memory 😢:

gpus = tf.config.experimental.list_physical_devices('GPU')
try:
    tf.config.experimental.set_virtual_device_configuration(gpus[0], 
                 tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5044)])
    """process...."""
except Exception as e:
    raise e

Personally I use GPU with 6GB of memory. And thank you @roebel, for this new arrow TF_FORCE_GPU_ALLOW_GROWTH=true to force my GPU for allocation 😊.

I had a similar issue before. limiting GPU memory manually helped. https://github.com/tensorflow/tensorflow/issues/25160#issuecomment-643703167

Just wanted to chime in and say that the problem is still there;

My specs: Ubuntu 20.04 NVIDIA RTX 2070 Nvidia_driver 440.64 Tensorflow-gpu 2.0.1 (Installed through conda, which automatically installs Cudatoolkit and CuDNN in same env) cudatoolkit 10.1.243 cudnn 7.6.5

Problem is solved by tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True)

However this seems more like a work-around than an actual fix, and a lot of people have 20XX cards these days. Probably there should be an update in which this issue is addressed.

Update: Since I’m dual-booting, I tried to check for windows as well. Problem persists there. Windows 10 Nvidia-driver 445.87 Other than that everything is similar

@odinsbane you’ll have to build TensorFlow from source to do what I suggest below.

First step is to add LOG(INFO) or std::cerr lines to MinSystemMemory to print out available_memory and the return value from MinSystemMemory. Does available_memory agree with what nvidia-smi prints? How much memory are we leaving for the system?

Secondly, does increasing the 0.05 magic number to, say, 0.07 help at all?

can confirm that building from source with changing the magic number 0.05 magic number to 0.1 seems to fix the issue (at least for 1.15.2)!

I also get this error working in the tensorflow 1.15.0-py3-gpu Docker image (Ubuntu 18.04) with two Titan V GPUs (@sanjoy) - not RTXs. However, this error only seems to occur for me on my GPU0 which has Xorg and gnome-shell using GPU0 memory while GPU1 only has python using GPU mem and does not throw this error. The error is also unfortunately intermittent – sometimes I will be able to remove the docker container, recreate it with the same settings and same code, then then the error will go away. Or not.

I was able to fix it using the Keras backend interface with:

import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
allow_growth_session = tf.Session(config=config)
tf.keras.backend.set_session(allow_growth_session)

Following is my nvidia-smi on both GPUs

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.26       Driver Version: 440.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN V             Off  | 00000000:01:00.0  On |                  N/A |
| 46%   63C    P2    51W / 250W |   7936MiB / 12065MiB |     31%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN V             Off  | 00000000:02:00.0 Off |                  N/A |
| 52%   70C    P2   131W / 250W |  12014MiB / 12066MiB |     60%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1564      G   /usr/lib/xorg/Xorg                            56MiB |
|    0      1607      G   /usr/bin/gnome-shell                          58MiB |
|    0      2428      G   /usr/lib/xorg/Xorg                           442MiB |
|    0      2574      G   /usr/bin/gnome-shell                         289MiB |
|    0      3292      G   ...p/pycharm-professional/167/jbr/bin/java    12MiB |
|    0      6794      G   anki                                          60MiB |
|    0     10336      G   /usr/lib/firefox/firefox                       6MiB |
|    0     16986      C   python                                      6981MiB |
|    1      4057      C   python                                     12001MiB |
+-----------------------------------------------------------------------------+

Same issue with an RTX2080, spent two days recompiling and bug hunting until I found this fix. (the allow_growth=true thing fixed it)

You made my day

unfortunately, I need to run code that only supports tensorflow 1.X

Probably one should do this only if allow memory growth is off. Otherwise you will always need about 580MB for the 2080 even if you don’t need all the operators.

I made a few more test concerning the minimum system memory requirements for running combinations of the three operations from my test case. I compare only the 1080 and 2080 cards. You dont find conv2d alone because it initializes blas in any case. Out comes

GPU MatMul STFT Conv2D+MatMUL MatMul+STFT MATMUL+STFT+Conv2D
1080 140MB 130MB 290MB 170MB 320MB
2080 190MB 190MB 520MB 250MB 580MB

One can see that on the 2080 cuda requires an overhead for each operation, and that this overhead increases when using more libraries. In most cases the overhead is <100MB but it becomes >220MB once Conv2D is involved…

If @samhodge has contact to NVIDIA I would personnally find it interesting to hear whether this is intended.

@roebel

I have struggled with this in my C++ application for a number of iterations.

What is came down to in the end was the following.

Only run models on the GPU when enough memory is available to run the model.

So the amount of memory that the model will require is quantifiable.

So you need to have a GPU memory as a percentage which will fit that model.

Then you also need to know about how much memory is available on the card exactly before allocating the memory, which is subject to race conditions, because you don’t know what else is using CUDA memory at the same time on the operating system.

But the race condition aside, you also need to measure the memory free.

This is done by using cudaMemInfo, which in itself uses memory.

So on the provision that you have enough memory to run cudaMemInfo once to measure and you need to make sure that enough memory is free to fit the model and run cudaMemInfo one more time, then and only then you can allocate enough of the percentage of available VRAM on that card for running the model.

Anyway the take home from my random babbling is that cudaMemInfo is required to poll the amount of memory available to allocate which in itself also uses some of that available memory.

Maybe somehow the amount of memory used by cudaMemInfo is different on a Turing based card compared at a Pascal based card, I can get someone from NVIDIA to have a look if you wish.

@samhodge @sanjoy @odinsbane

Finally I have been able to run the patched library on the rtx 2080 cards. As expected the patched version does not pass. Here again the script

import tensorflow as tf
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32), filters=tf.zeros((2,2,20,20), dtype=tf.float32), strides=(1,1,1,1), padding="VALID")

And here the matrix of available memory reported from gpu_device.cc, default value of Min_system_memory as selected in gpu_device.cc and the min value of the min_system_memory I need to select for the script to not abort:

Card AvailMem Def MinSysMem Required MinSysMem
1050 TI 4163764224 314572800 325058560
1080 TI 11567431680 578371584 335544320
2080 TI 11381964800 569098240 618659840

So while 1050 and 1080 run the script with about the same memory size the RTX2080 requires nearly twice as much memory. This does not sound good to me.

Any suggestions what to try to get this to a comparable value?

@roebel I did not recall what triggered the problem for you.

see this https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-480549043

Which is why I thought it was memory related, this issue has not effected me for some time, nor the users of my software on a variety of platforms.

OS: ubuntu 18.04 lts

Driver Version: 435.21

CUDA: cudatoolkit 10.1

CUDNN: cudnn-7.6.5-cuda10.1_0

I used anaconda install tensorflow

conda create -n tf-gpu tensorflow-gpu

the cudatoolkit and cudnn are auto-install by anaconda through the command before.

I have the same question, The error:

coreClock: 1.5315GHz coreCount: 3 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 44.76GiB/s
2020-05-12 17:58:44.119679: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-12 17:58:44.119694: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-12 17:58:44.119707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-12 17:58:44.119719: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-12 17:58:44.119732: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-12 17:58:44.119744: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-12 17:58:44.119756: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-12 17:58:44.119819: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.120069: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.120277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-12 17:58:44.120308: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-12 17:58:44.174976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-12 17:58:44.175003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-05-12 17:58:44.175012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-05-12 17:58:44.175136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.175392: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.175624: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.175844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1439 MB memory) -> physical GPU (device: 0, name: GeForce MX150, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-05-12 17:58:44.177113: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abc3d20b80 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-12 17:58:44.177129: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce MX150, Compute Capability 6.1
2020-05-12 17:58:44.177749: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 376320000 exceeds 10% of system memory.
2020-05-12 17:58:44.787493: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 376320000 exceeds 10% of system memory.
WARNING:tensorflow:Layer my_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

2020-05-12 17:58:45.311821: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-12 17:58:45.467966: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-12 17:58:45.904025: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-05-12 17:58:45.913861: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-05-12 17:58:45.913978: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node my_model/conv2d/Conv2D}}]]

I did try compiling from source, but ran into the same issue. I was finally able to fix my problem was setting config.gpu_options.allow_growth = True.

This code is shared to make it faster available for both tensorflow and keras users. source from here

# Tensorflow
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)


#And for Keras
from keras.callbacks import ModelCheckpoint
from keras.models import Model, load_model, save_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking, TimeDistributed, LSTM, Conv1D
from keras.layers import GRU, Bidirectional, BatchNormalization, Reshape
from keras.optimizers import Adam
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
sess = tf.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras

I got the same problem with following configuration: TensorFlow installed from (source or binary): r1.13.1,r.1.13.2,r1.14 Python version: 3.6.1 Bazel version (if compiling from source): GCC/Compiler version (if compiling from source): CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1 GPU model and memory: RTX 2070 8GB.

I sovled this problem with: TensorFlow installed from (source or binary): r1.12.0 Python version: 3.6.9 GCC/Compiler version: 4.8 CUDA/cuDNN version: CUDA 9.0 with cuDNN 7.1.4 GPU model and memory: RTX 2070 8GB. Hope helpful to you

I had the same problem and allow_growth = True was the solution. BUT, for TensorFlow 2, in order to do that you need to add the following lines:

gpu_devices = tf.config.experimental.list_physical_devices('GPU') for device in gpu_devices: tf.config.experimental.set_memory_growth(device, True)

Thanks to user @opcecco in this issue: https://github.com/tensorflow/tensorflow/issues/25446

我也遇到这个问题 anacondacloud install tensorflow-gpu2.0 rtx2070s tensorflow-gpu.2.0.0 cuda 10.0.13 cudnn 7.6.5 无法创建cudnn句柄:CUDNN_STATUS_INTERNAL_ERROR 无法获取卷积算法。这可能是因为cuDNN初始化失败,所以请尝试查看上面是否打印了警告日志消息。

您是否插入:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
    print(e)

在您输入代码的顶部?

yeah,I solved this problem like this way.Thanks!!

We are facing relevant issues

System specifications

  • Ubuntu 18.04.3 LTS
  • RTX 2070
  • python 3.7.1
  • tf-gpu 2.0.0
  • V10.0.130 CUDA
  • libcudnn7 7.6.2

The error is triggered when I try to use LSTM, GRU, RNN etc.

Actual error

2019-12-23 16:09:00.912238: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-12-23 16:09:01.408990: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-12-23 16:09:01.409043: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cudnn_rnn_ops.cc:1491 : Unknown: Fail to find the dnn implementation.

File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/recurrent_v2.py", line 961, in call **cudnn_lstm_kwargs) File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/recurrent_v2.py", line 1174, in cudnn_lstm rnn_mode='lstm') File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 109, in cudnn_rnn ctx=_ctx) File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 198, in cudnn_rnn_eager_fallback attrs=_attrs, ctx=_ctx, name=name) File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation. [Op:CudnnRNN]

Apparent problem

As it seems all my memory is eaten out pretty fast. The problems seems to come up only in gpu mode, the same code works fine with cpu

Trials

  • allow memory growth
  • create virtual device with limited memory

Both tries produce the same error.

Any ideas?

Hi,

I cannot reproduce this on my machine so I’ll need some help root-causing this. Do we have someone here who can reproduce the problem and is willing to do some hands-on debugging?

As a starting point I’d like to understand why MinSystemMemory does not preserve enough memory for cuDNN. If someone with a setup that reproduces this issue can add some logging (as a local patch) to discover out the amount of memory returned by MinSystemMemory that would be great. And does increasing the magic 0.05 number in MinSystemMemory help the situation?

@clementpoiret: Please note that the tf.config.experimental.set_memory_growth call is unnecessary since tf.config.experimental.set_virtual_device_configuration overrides that flag since it slices up the GPU memory and pre-allocates the allocated memory.

@synapse8 I don’t see something equivalent in tensorflow 2.0’s documentation, any way to do so with tf.config.experimental ?

Edit: I’m gonna try to set memory this way, to see if it’s solving the issue:

import subprocess
import tensorflow as tf


def get_gpus_memory():
    """Get the max gpu memory.

    Returns
    -------
    usage: list
        Returns a list of total memory for each gpus.
    """
    result = subprocess.check_output([
        "nvidia-smi", "--query-gpu=memory.total",
        "--format=csv,nounits,noheader"
    ]).decode("utf-8")

    gpus_memory = [int(x) for x in result.strip().split("\n")]
    return gpus_memory


def setup_gpus(allow_growth=True, memory_fraction=.9):
    """Setup GPUs.
    
    Parameters:
    allow_growth (Boolean)
    memory_fraction (Float): Set maximum memory usage, with 1 using
        maximum memory
    """
    gpus = tf.config.experimental.list_physical_devices("GPU")
    if gpus:
        try:
            # Currently, memory growth needs to be the same across GPUs
            for i, gpu in enumerate(gpus):
                memory = get_gpus_memory()[i]

                tf.config.experimental.set_memory_growth(gpu, allow_growth)

                # Setting memory limit to max*fraction
                tf.config.experimental.set_virtual_device_configuration(
                    gpu, [
                        tf.config.experimental.VirtualDeviceConfiguration(
                            memory_limit=memory * memory_fraction)
                    ])

                logical_gpus = tf.config.experimental.list_logical_devices(
                    "GPU")
                print(len(gpus), "Physical GPUs,", len(logical_gpus),
                      "Logical GPUs")
        except RuntimeError as e:
            # Memory growth must be set before GPUs have been initialized
            print(e)

This way we can conveniently just call setup_gpus(True, .9)

I tried using that for tensorflow 2.0:

    config = tf.compat.v1.ConfigProto()
    config.gpu_options.allow_growth = True
    session = tf.compat.v1.Session(config=config)

It fixes cudnn error on my rtx2080, but the training is as fast as my 1050Ti on my laptop! While training a CNN:

Tue Nov 12 19:22:35 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.26       Driver Version: 440.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:2D:00.0 Off |                  N/A |
|  0%   37C    P2    75W / 265W |   2904MiB /  7979MiB |     27%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1026      G   /usr/lib/Xorg                                200MiB |
|    0      6420      G   cinnamon                                      43MiB |
|    0     21073      C   /home/clementpoiret/anaconda3/bin/python    2647MiB |
+-----------------------------------------------------------------------------+

Adding

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=7000)]

Didn’t solve the issue, without allow_growth I’m getting the cudnn error, and anyway my RTX is only using something like 3Gb or memory.

Any idea ?

I tried

    gpus = tf.config.experimental.list_physical_devices('GPU')
    tf.config.experimental.set_memory_growth(gpus[0], True)
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=7900)])

but cudnn is still throwing an error

I tested building tf-2.0.0-beta1 from sources with CUDA-10.1 and CUDNN-7.6.2.4 and the error doesn’t manifest.

You can find docker images for building a tf-gpu package and a tf-base package here: https://github.com/edowson/docker-tensorflow

The anaconda channel doesn’t have cudnn==7.6.2 at the time of writing this comment.

@Hayashi-Yudai

I tried config.gpu_options.allow_growth = True, but it does not solve this error.

What were the exact commands you added to your code? Try the following instead if it’s different …

config = tf.ConfigProto() config.gpu_options.allow_growth = True tf.keras.backend.set_session(tf.Session(config=config))

@robzor92 I doubt the 1050Ti’s problem is with the small VRAM size. The RTX cards would encounter this on the basic CNN MNIST models. I doubt it’s NVIDIA’s tweaking of VRAM allocation on RTX cards somehow messed things up.

I ran into this issue as well, and was able to solve it by using @va-andrew 's solution, and specifically, I used @colinsteidtmann 's implementation, since I use some of the tensorflow.keras functions in my code. I spent a long time trying to debug this problem, so thank you both for your contributions.

EDIT: I was just looking at tensorflow documentation (https://www.tensorflow.org/guide/using_gpu), and you can also tell it to allow memory growth by setting the environment variable TF_FORCE_GPU_ALLOW_GROWTH to true. It also says that this configuration is platform specific, so YMMV (works for me with Ubuntu 18.04).

For reference, I am running: Ubuntu 18.04.2 LTS, Gigabyte GeForce RTX 2080 Turbo, NVIDIA driver 430.26, CUDA 10.0.130, cuDNN 7.4.2.24, tensorflow-gpu 1.13.1, python 3.6. I run tensorflow from within a virtual environment, using spyder 3.3.4.

I have a 2nd computer with the exact same hardware, and I set it up following the same set of instructions, used the same files to do the install, and had this issue on that machine as well. No surprise there.

I have a 3rd computer with the exact same hardware, except that it has a 2080 Ti instead of the 2080, and I set it up following the same set of instructions, and again used the same files to do the install. But this time, there was no issue.

So, I’m led to believe it’s not related to some conflict of CUDA, cuDNN, and driver version; it’s not an incorrectly done installation, etc. Rather, it’s related to the model of video card; I’ve only seen mention of this issue with RTX 2060, 2070, and 2080.

Fortunately, it’s not a big inconvenience to use the workaround.

The descriptions of the problems you are seeing makes me believe that (particular version of) cuDNN tries to allocate GPU memory when creating the handle. If TensorFlow already took all the memory (either because config.gpu_options.allow_growth = false, or per_process_gpu_memory_fraction close to 1.0) there is no memory left to allocate for cuDNN.

You could confirm this by running TensorFlow through nvprof and generate an API trace to inspect the failing cuMemAlloc call.

Issue #6698 seems to discuss the same problem. Some people noticed that they had accidentally used a cuDNN release that doesn’t match their CUDA version. Could you please verify that you are using cuDNN for CUDA 10 when running with CUDA 10?

Still having the same issue here but “config.gpu_options.allow_growth = True” doesn’t fix the problem. Happens on both TF-gpu 1.14.1 and TF-gpu 2.0. RTX1070, CUDA 10.0, Ubuntu 18.04, Nvidia driver 430.09.

Running into the same issue on a GTX 1050 using tensorflow-gpu 1.13.1 from pip with CUDA 10.0/cuDNN 7.4.2.24/Nvidia driver 410/Ubuntu 16.04.