tensorflow: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice.

Click to expand!

Issue Type

Bug

Source

binary

Tensorflow Version

2.11.0

Custom Code

Yes

OS Platform and Distribution

Linux Ubuntu 22.04.01

Mobile device

No response

Python version

3.9.15

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

CUDA 11.2, cuDNN 8.1.0

GPU model and memory

Nvidia GTX 1060 6gb

Current Behaviour?

I was installing tensorflow according to this guide https://www.tensorflow.org/install/pip and ran into the error. I am running a fresh install of ubuntu. I have tried `export XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda` to no avail.

Standalone code to reproduce the issue

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5) #fails here

Relevant log output

Epoch 1/5
2022-11-24 23:30:47.064919: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x7f4ce3255ca0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-11-24 23:30:47.064946: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): NVIDIA GeForce GTX 1060 6GB, Compute Capability 6.1
2022-11-24 23:30:47.068586: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2022-11-24 23:30:47.086148: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.2
  /usr/local/cuda
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2022-11-24 23:30:47.087159: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2022-11-24 23:30:47.087339: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2022-11-24 23:30:47.087434: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2022-11-24 23:30:47.106008: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2022-11-24 23:30:47.106292: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2022-11-24 23:30:47.125456: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2022-11-24 23:30:47.125753: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2022-11-24 23:30:47.144359: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2022-11-24 23:30:47.144670: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
Cell In [4], line 1
----> 1 model.fit(x_train, y_train, epochs=5)

File ~/miniconda3/envs/tf/lib/python3.9/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     50 try:
     51   ctx.ensure_initialized()
---> 52   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     53                                       inputs, attrs, num_outputs)
     54 except core._NotOkStatusException as e:
     55   if name is not None:

InternalError: Graph execution error:

Detected at node 'StatefulPartitionedCall_2' defined at (most recent call last):
    File "/home/nathan/miniconda3/envs/tf/lib/python3.9/runpy.py", line 197, in _run_module_as_main
...
    File "/home/nathan/miniconda3/envs/tf/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_2'
libdevice not found at ./libdevice.10.bc
	 [[{{node StatefulPartitionedCall_2}}]] [Op:__inference_train_function_766]

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 20
  • Comments: 33 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Hello all,

I just want to reiterate the issue and give a possibly new solution. I’m having the problem described in this thread when following the step-by-step guide, and I’ll try to lay everything down here to make it clear and reproducible.

System Info

  • OS: Ubuntu 20.04.5 LTS
  • GPU: GeForce RTX 2080 SUPER
  • Driver Version: 510.108.03

The system does not have CUDA installed through any other means, such as apt, as the goal is to install it using conda as described in the installation guide.

Installation Procedure

Taken directly from the pip installation guide page.

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda create --name tf_2-11_gpu python=3.9 --yes
conda activate tf_2-11_gpu

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0 --yes

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

pip install --upgrade pip
pip install tensorflow==2.11

When I run the verification scripts from the installation guide the GPU is detected and it runs successfully.

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Output:

2023-01-27 14:04:05.322921: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-27 14:04:05.855462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/
2023-01-27 14:04:05.855515: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/
2023-01-27 14:04:05.855522: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-27 14:04:06.270401: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:04:06.274057: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:04:06.274215: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Output:

2023-01-27 14:05:52.832973: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-27 14:05:53.348636: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/
2023-01-27 14:05:53.348685: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/
2023-01-27 14:05:53.348691: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-27 14:05:53.768505: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:05:53.772137: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:05:53.772292: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:05:53.772640: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-27 14:05:53.773207: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:05:53.773348: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:05:53.773475: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:05:54.134784: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:05:54.134963: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:05:54.135099: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:05:54.135229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6124 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:0a:00.0, compute capability: 7.5
tf.Tensor(967.4906, shape=(), dtype=float32)

Issue

The issue arises when trying to train a model. The following script, taken from the overview page, has the issue.

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

Output:

2023-01-27 14:09:50.588913: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-27 14:09:51.089217: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/
2023-01-27 14:09:51.089266: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/
2023-01-27 14:09:51.089272: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-27 14:09:51.715836: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:09:51.719437: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:09:51.719598: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:09:51.719933: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-27 14:09:51.720504: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:09:51.720646: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:09:51.720772: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:09:52.082548: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:09:52.082724: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:09:52.082858: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:09:52.082983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6091 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:0a:00.0, compute capability: 7.5
Epoch 1/5
2023-01-27 14:09:52.886120: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x1d082920 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-01-27 14:09:52.886149: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): NVIDIA GeForce RTX 2080 SUPER, Compute Capability 7.5
2023-01-27 14:09:52.889303: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-01-27 14:09:52.902596: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.2
  /usr/local/cuda
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-01-27 14:09:52.903255: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-01-27 14:09:52.903360: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2023-01-27 14:09:52.903435: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-01-27 14:09:52.917154: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-01-27 14:09:52.917312: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-01-27 14:09:52.931297: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-01-27 14:09:52.931464: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-01-27 14:09:52.945130: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-01-27 14:09:52.945294: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
  File "/home/wheatley/Documents/code/loops/temp/minimal_tf_test.py", line 18, in <module>
    model.fit(x_train, y_train, epochs=5)
  File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'StatefulPartitionedCall_2' defined at (most recent call last):
    File "/home/wheatley/Documents/code/loops/temp/minimal_tf_test.py", line 18, in <module>
      model.fit(x_train, y_train, epochs=5)
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_2'
libdevice not found at ./libdevice.10.bc
	 [[{{node StatefulPartitionedCall_2}}]] [Op:__inference_train_function_766]

Solutions so Far

As others have stated, going back to Tensorflow 2.10 avoids the issue. If I run the exact same installation process described above, but run pip install tensorflow==2.10 instead, the sample training script runs on my GPU without issue. That is a viable solution for the time being, but of course not ideal.

When trying to fix the 2.11 installation by the methods described in this thread I had the same issue as @frankcaoyun. I set XLA_FLAGS with the following script.

mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice/
cp -p $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/
export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib

This fixes the previous issue, but a new one arises when trying to run the training script.

Output:

2023-01-27 14:19:06.913716: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-27 14:19:07.416502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/
2023-01-27 14:19:07.416550: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/wheatley/miniconda3/envs/tf_2-11_gpu/lib/
2023-01-27 14:19:07.416556: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-27 14:19:08.032085: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:19:08.035708: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:19:08.035866: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:19:08.036200: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-27 14:19:08.036731: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:19:08.036876: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:19:08.037003: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:19:08.385892: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:19:08.386072: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:19:08.386207: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-27 14:19:08.386333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6119 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:0a:00.0, compute capability: 7.5
Epoch 1/5
2023-01-27 14:19:09.216551: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x1ce31ee0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-01-27 14:19:09.216578: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): NVIDIA GeForce RTX 2080 SUPER, Compute Capability 7.5
2023-01-27 14:19:09.219677: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-01-27 14:19:09.271627: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-27 14:19:09.272324: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-27 14:19:09.272334: W tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:85] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2023-01-27 14:19:09.272961: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-27 14:19:09.273004: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:454] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Aborted (core dumped)

The standout lines are disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable and Couldn't invoke ptxas --version. I tried setting MLIR_CRASH_REPRODUCER_DIRECTORY, and the output remains the same, just without the disabling MLIR crash reproducer line.

Possible Fix

The rest of the errors are all related to ptxas. I’m unsure of what the issue is exactly, as I’m not familiar with ptxas. But if I try to run ptxas --version then indeed the output is Command 'ptxas' not found. Now, that also happens in my environment with Tensorflow 2.10, but the code still runs fine on the GPU.

Regardless, I attempted to fix the issue by installing ptxas. This post mentions the conda-forge package cudatoolkit-dev, so I tried installing ptxas by running:

conda install -c conda-forge cudatoolkit-dev=11.2 --yes

And sure enough, now when I run run ptxas --version I get:

ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:21_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

And now the training script runs on my GPU!

This is mostly just a quick, not ideal, fix. I’m not entirely sure why version 2.11 seems to need ptxas installed while version 2.10 seems to do fine without it, but apparently this fixes the issue.

I think it is important for this issue to be properly fixed, either by changing Tensorflow 2.11’s code, the conda packages, or the installation guide. Following the step-by-step guide on the website should install version 2.11 without much issue, just like it is for version 2.10.


EDIT:

Minor update. Instead of running:

conda install -c conda-forge cudatoolkit-dev=11.2 --yes

I ran:

conda install -c nvidia cuda-nvcc --yes

And it also fixed the issue. Now ptxas --version returns:

ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:43:29_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

And the sample code runs on my GPU.

Hello all,

It seems like our solution made it to the official installation guide!

Now, at the bottom of the step-by-step instructions for Linux, there is the following section:

Ubuntu 22.04

In Ubuntu 22.04, you may encounter the following error:

Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice.
...
Couldn't invoke ptxas --version
...
InternalError: libdevice not found at ./libdevice.10.bc [Op:__some_op]

To fix this error, you will need to run the following commands.

# Install NVCC
conda install -c nvidia cuda-nvcc=11.3.58
# Configure the XLA cuda directory
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
printf 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib/\n' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# Copy libdevice file to the required path
mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/

Of course, we can’t know for sure that the solution on the official page came from or was inspired by this thread, but I would certainly like to think so.

Fixed it to me:

  1. Adding Nvidia key (https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#network-repo-installation-for-wsl) 2 sudo apt-get install cuda-minimal-build-11-8 Its only a 45MB download.

My setup is a RTX 3060, WSL2 with the default ubuntu subsystem 22.04.2 LTS fresh installed, and following the installation guide in Tensorflow page.

Hi @NSalberg ,@epetrovski, @chrsunwil ,

Inspired from a discussion at TF-Forum,you may resolve the issue by following these steps.

1.Create a folder nvvm/libdevice folder in the Conda environment lib folder.

2.Copying the libdevice.10.bc file to the directory nvvm/libdevice you may find this file in your system like the paths below :

miniconda3/pkgs/cudatoolkit-11.2.2-hbe64b41_10/info/recipe/NVIDIA_EULA:libdevice.10.bc
miniconda3/pkgs/cudatoolkit-11.2.2-hbe64b41_10/info/files:lib/libdevice.10.bc
miniconda3/pkgs/cudatoolkit-11.2.2-hbe64b41_10/info/licenses/NVIDIA_EULA:libdevice.10.bc
  1. Use the command: export XLA_FLAGS=–xla_gpu_cuda_data_dir=/home/miniconda3/envs/lib The path /home/miniconda3/envs/lib may be different for you it should trace for absolute path of lib folder present in miniconda3 like miniconda3/envs/lib or miniconda/lib

Please try this and let us know if it works.

Thankyou!

Hi @SuryanarayanaY ,

There is a typo in your post. “–xla_gpu_cuda_data_dir” should be “–xla_gpu_cuda_data_dir”, double dashes instead of single dash. Someone copy-pasting the code will not get it work.

Despite that, after corrected the typo, I I followed the steps and encounter the errors below:

2023-01-18 17:23:08.736800: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-18 17:23:08.737224: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-18 17:23:08.737250: W tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:85] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2023-01-18 17:23:08.737871: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-18 17:23:08.737950: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:454] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

My linux machine is freshly installed with Ubuntu 22.04.1 LTS, with tensorflow=2.11.0 cudatoolkit=11.2 cudnn=8.1.0 (following the latest official installation guide)

I found out that I was having this issue because tensorflow/keras > 2.10 requires the cuda-compiler package to be installed for fitting models. Running apt-get install cuda-compiler-11-8 creates the required libdevice directory in ${CUDA_DIR}/nvvm/libdevice.

However, you do not have to install the entire cuda-toolkit package, which is enormous.

Hi @NSalberg ,@epetrovski, @chrsunwil ,

Inspired from a discussion at TF-Forum,you may resolve the issue by following these steps.

1.Create a folder nvvm/libdevice folder in the Conda environment lib folder.

2.Copying the libdevice.10.bc file to the directory nvvm/libdevice you may find this file in your system like the paths below :

miniconda3/pkgs/cudatoolkit-11.2.2-hbe64b41_10/info/recipe/NVIDIA_EULA:libdevice.10.bc
miniconda3/pkgs/cudatoolkit-11.2.2-hbe64b41_10/info/files:lib/libdevice.10.bc
miniconda3/pkgs/cudatoolkit-11.2.2-hbe64b41_10/info/licenses/NVIDIA_EULA:libdevice.10.bc
  1. Use the command: export XLA_FLAGS=–xla_gpu_cuda_data_dir=/home/miniconda3/envs/lib The path /home/miniconda3/envs/lib may be different for you it should trace for absolute path of lib folder present in miniconda3 like miniconda3/envs/lib or miniconda/lib

Please try this and let us know if it works.

Thankyou!

I started experiencing this issue in a Docker container. The Docker image had not changed (ie. my CUDA setup) but I noticed that TensorFlow’s Keras dependency was updated to v. 2.11. I’ve locked Keras to v. 2.10 and now everything works again.

Hi all, Recently we noticed that latest Ubuntu OS installing CUDA library at usr/lib/cuda but Tensorflow expects it to be at /usr/local/cuda as per Conda installation instructions and its worked so far.The command whereis cuda will confirm the location of the CUDA library.The workaround is symlink using command sudo ln -s /usr/lib/cuda /usr/local/cuda.

From the error log attached above for this ticket I observed the below log where the problem seems to be related to CUDA path.

Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.2
  /usr/local/cuda

Iam pretty confident that symlink as mentioned above will work for this case.

@frankcaoyun Can you try the proposed workaround and confirm if it works for you.

Hi @mohantym.

I’m actually having the same (at least very similar) issue on a fresh install of Ubuntu 22.04 running on metal.

Usin tensorflow 2.11

I followed the guide here and can confirm that I did use conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0.

Tensorflow can find my gpu, but it has issues once it tries to train.

I’m using a GTX 3060 with the 520 drivers.

I was able to fix my issue by downgrading tensorflow to 2.10

Let me know if any other information could be helpful.

I’m pretty sure this is due to the fact that tensorflow 2.11 requires keras >= 2.11, wheras tensorflow 2.10 requires keras >= 2.10. This issue seems to be due to keras v. 2.11.

Hi @eduardoscsouza , I have installed nvidia driver with .run extension file and then followed same instructions which you followed(as per official documentation) and the model is able to train and model.evaluate also done.Please refer to below log.

(tf2.11) suryanarayanay@surya-ubuntu-22-04:~$ python 58681_rev.py 
2023-01-30 13:51:07.645676: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-30 13:51:08.254836: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-01-30 13:51:10.710146: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/suryanarayanay/miniconda3/envs/tf2.11/lib/
2023-01-30 13:51:10.710298: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/suryanarayanay/miniconda3/envs/tf2.11/lib/
2023-01-30 13:51:10.710316: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2.11.0
2023-01-30 13:51:17.810662: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-30 13:51:20.192374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38235 MB memory:  -> device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:00:04.0, compute capability: 8.0
2023-01-30 13:51:20.195699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 38235 MB memory:  -> device: 1, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:00:05.0, compute capability: 8.0
Epoch 1/5
2023-01-30 13:51:23.967284: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:630] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-01-30 13:51:24.022911: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x7ef93d1c8850 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-01-30 13:51:24.022950: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): NVIDIA A100-SXM4-40GB, Compute Capability 8.0
2023-01-30 13:51:24.022959: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (1): NVIDIA A100-SXM4-40GB, Compute Capability 8.0
2023-01-30 13:51:24.071871: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-01-30 13:51:24.458929: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-30 13:51:24.604518: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
1875/1875 [==============================] - 7s 2ms/step - loss: 0.2919 - accuracy: 0.9159 
Epoch 2/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1422 - accuracy: 0.9573
Epoch 3/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1084 - accuracy: 0.9668
Epoch 4/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0888 - accuracy: 0.9734
Epoch 5/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0751 - accuracy: 0.9758
313/313 [==============================] - 1s 2ms/step - loss: 0.0796 - accuracy: 0.9750
(tf2.11) suryanarayanay@surya-ubuntu-22-04:~$ 

Could you please confirm how you installed Nvidia driver and what type of file it is like .deb or .run file etc? Also Can you share the output of whereis cuda command ?

Please provide the info for further investigation of your case.Thanks!

Hi @SuryanarayanaY,

To answer your questions, I installed the driver using Ubuntu’s “Additional Drivers” utility. The CLI route would be sudo ubuntu-drivers install 525. I upgraded to the 525 version of the driver to see if it would fix the issue, but it remains the same.

And the output to whereis cuda is:

cuda:

But I think your line of inquiry is misguided. My driver installation method seems to work, as I haven’t had any other issues. I can run TF 2.10 and Pytorch on the GPU with no problem, and I can successfully run nvidia-smi and get a proper output. The driver installation step should really only need to install the driver itself, as that is the only component that NEEDS to be outside the virtual environment. Packages such as CUDA should be handled by Conda and should lie entirely inside the virtual environment, as seems to be the intention in the installation guide, as in the step conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0.

The issue seems to me to be with TF 2.11 and Conda’s CUDA installation. What I think may be happening is that you may not experience the issue if your method of installing the driver also installs CUDA or ptxas in your system. The Conda environment would still be defective and incomplete, but TF would use the system-wide CUDA or ptxas package and it would work, but this kind of defeats the purpose of a virtual environment.

I also noted that your output doesn’t have the Couldn't invoke ptxas --version error, which indicates that your installation procedure installed ptxas properly, and the lack of this package really seems to be at the core of the issue.

Hi all, Recently we noticed that latest Ubuntu OS installing CUDA library at usr/lib/cuda but Tensorflow expects it to be at /usr/local/cuda as per Conda installation instructions and its worked so far.The command whereis cuda will confirm the location of the CUDA library.The workaround is symlink using command sudo ln -s /usr/lib/cuda /usr/local/cuda.

From the error log attached above for this ticket I observed the below log where the problem seems to be related to CUDA path.

Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.2
  /usr/local/cuda

Iam pretty confident that symlink as mentioned above will work for this case.

@frankcaoyun Can you try the proposed workaround and confirm if it works for you.

Hi @SuryanarayanaY ,

Regarding “latest Ubuntu OS installing CUDA library at usr/lib/cuda”, may I confirm which source of installation you are referring to? Is it from sudo apt-get, ‘conda’ as per the tensorflow installation guide, or the .deb or .run from the nvidia cuda coolkit archive repo? This will help me test this solution, as I actually can’t find any CUDA folder under the usr/lib directory.

On a fresh Ubuntu 22.04 installation, if I only install the nvidia display driver, followed by installing cuda toolkit and cuda cnn as per the tensorflow installation guide, I will be able to do model inferencing using GPU, but not training. It will throw the CUDA directory error that you have attached.

What works for me at this moment: I found this post worked for me consistently (and I believe it should work on). I’m able to run both inferencing and training on GPU, without the conda installation for cudatoolkit and cudnn.

Let me know what else I can do to help troubleshoot the issue. Thanks.

I had the same error. Setting up XLA_FLAGS made the error go away but did not actually fix the problem. Installing cuda-nvcc in the conda environment fixed it.

My setup for people coming to this thread in the future: Linux Mint, the version based on Ubuntu 22.04, Miniconda, tensorflow 2.13.1, python 3.11, CUDA 11.8, NVIDIA GeForce RTX 2060 SUPER, nvidia driver version: 535.129.03.

I am having the same issue as you are @timlac . I am on Pop!_OS 22.04 LTS. I hope someone figures out what is actually going on.

Edit: After doing a lot of research, I was finally able to solve my issue. I followed the instructions from the official page for the initial setup. After that it was able to detect that I had GPU, but whenever I wanted to train a model, it would error out:

    Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice.
    ...
    Couldn't invoke ptxas --version
    ...
    InternalError: libdevice not found at ./libdevice.10.bc [Op:__some_op]

Which I fixed by following Ubuntu 22.04 section from the official page. However, now I can only run the models from the shell and whenever I try to run the model meaning .py or .ipynb file from pycharm it again shows the previously mentioned errors (1. PyCharm not detecting the GPU, which I fixed by adding an environment variable LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/{your_user}/miniconda3/envs/tf/lib/:/home/{your_user}/miniconda3/envs/tf/lib/python3.9/site-packages/nvidia/cudnn/lib) and this fix is also mentioned in the previous posts as well. 2. Couldn't invoke ptxas --version which I fixed by adding another environment variable in the configuration XLA_FLAGS=--xla_gpu_cuda_data_dir=/home/{your_username}/miniconda3/envs/tf/lib/).

The above two environment variables fixed the issue for PyCharm. At least for me 😃

I fixed the issue like this:

  1. Determine where is libdevice located using find / -name "libdevice.10.bc" 2>/dev/null, in my case was in /opt/env/lib/python3.10/site-packages/triton/third_party/cuda/lib
  2. Check where is cuda installed with find / -name "cuda" 2>/dev/null, e.g. in my case its located in /usr/local/cuda
  3. In my case I created a symlink to put the missing libdevice library inside cuda: ln -s /opt/env/lib/python3.10/site-packages/triton/third_party/cuda/lib /usr/local/cuda

I am using:

keras==2.12.0
autokeras==1.0.20
Tensorflow==2.12.0

*I am using singularity containers for building this with cuda 11.8

Hi @SuryanarayanaY,

Thank you very much for the proposed solution. It’s quite difficult for me to test this out since my issue occurred in a production docker setup but I hope someone else will…

However, I could not help noticing that the solution requires elements from the cuda-toolkit package. Do you know if the intention is that tf.keras will require cuda-toolkit going forward? This was not required previously and cuda-toolkit is not even included in the official nvidia cuda runtime docker image, just the cuda runtime libraries.

Could you confirm that you have Installed cuda files (11.2/8.1) through Conda in a new environment as per this instruction and let us know.

I can confirm I installed the cuda files through those instructions.

Anyways, I was able to get TensorFlow running with my gpu by installing it in a new conda environment using this command

conda create -n tf-gpu tensorflow-gpu

albeit not the latest version.

Hi @mohantym.

I’m actually having the same (at least very similar) issue on a fresh install of Ubuntu 22.04 running on metal.

Usin tensorflow 2.11

I followed the guide here and can confirm that I did use conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0.

Tensorflow can find my gpu, but it has issues once it tries to train.

I’m using a GTX 3060 with the 520 drivers.

I was able to fix my issue by downgrading tensorflow to 2.10

Let me know if any other information could be helpful.

Hi @NSalberg !

It looks like a Cuda set up issue. I could not replicate in Colab environment. Could you confirm that you have Installed cuda files (11.2/8.1) through Conda in a new environment as per this instruction and let us know.

#create new environment with conda 
#activate environment 
#install cuda files
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
#install tensorflow through pip

Thank you!