tensorflow: Docker with GPU 2.3rc0 CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
It seem that the Docker image tensorflow/tensorflow:2.3.0rc0-gpu won’t work with my GPU BUT on the other hand the image tensorflow/tensorflow:2.2.0rc0-gpu works fine
Or in other words, the solution to the present issue was to “downgrade” to tensorflow/tensorflow:2.2.0rc0-gpu tensorflow/tensorflow:2.3.0rc0-gpu also works fine with CPU only.
System information
- Ubuntu 20.4
- TensorFlow through Docker
- TensorFlow version (use command below):
- GPU model and memory: Geforce GTX 960M, coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
- GPU drivers: 440.100
how to reproduce
> docker run -it --rm --gpus all --entrypoint bash tensorflow/tensorflow:2.3.0rc0-gpu
> python
>>> import tensorflow as tf
>>> inputs = tf.keras.layers.Input(shape=(None,), name="input")
>>> embedded = tf.keras.layers.Embedding(100, 16)(inputs)
full stack trace:
2020-07-06 18:46:55.604377: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-07-06 18:46:55.608404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.608911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
2020-07-06 18:46:55.608943: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-06 18:46:55.610544: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-06 18:46:55.611696: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-06 18:46:55.611988: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-06 18:46:55.613589: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-06 18:46:55.614478: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-06 18:46:55.618025: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-07-06 18:46:55.618159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.618734: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.619206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-07-06 18:46:55.619480: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-07-06 18:46:55.643133: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2693910000 Hz
2020-07-06 18:46:55.643781: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44161a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-06 18:46:55.643809: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-07-06 18:46:55.725002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.725324: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44aa610 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-06 18:46:55.725349: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 960M, Compute Capability 5.0
2020-07-06 18:46:55.725532: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.725767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
2020-07-06 18:46:55.725796: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-06 18:46:55.725828: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-06 18:46:55.725854: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-06 18:46:55.725882: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-06 18:46:55.725908: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-06 18:46:55.725938: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-06 18:46:55.725988: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-07-06 18:46:55.726091: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.726485: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.726724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-07-06 18:46:55.726756: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 926, in __call__
input_list)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1098, in _functional_construction_call
self._maybe_build(inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2643, in _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/tf_utils.py", line 323, in wrapper
output_shape = fn(instance, input_shape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/embeddings.py", line 135, in build
if (context.executing_eagerly() and context.context().num_gpus() and
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1082, in num_gpus
self.ensure_initialized()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 539, in ensure_initialized
context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 58 (18 by maintainers)
For Nvidia 3090, Ubuntu 20.04, Cuda 10.1, Cudnn 7.6, Nvidia GPU driver 455 have the same isseu
Apologies for adding more activity to this issue @av8ramit but we wanted to find out if there was going to be a point release of the TensorFlow C library v2.3 that has been patched with the correct CUDA capabilities? I only ask because v2.3 is the current stable version, it works with the standard CUDA version in Ubuntu 20.04, and when installing tensorflow through python for training with keras it also uses the same version.
@navganti PTAL here https://github.com/tensorflow/tensorflow/issues/41892#issuecomment-667452483.
We removed PTX for all but sm_70 from TF builds in cf1b6b3dfe9ba82e805fddf7f4462b2d92fe550a. We never shipped with kernels for sm_50, only sm_52. Apparently the driver was able to compile PTX for sm_52 to sm_50, even though it’s not officially supported.
If you want to run on a sm_50 card, it would be best to build TF from source.
Apolgies. It seems our CI uploaded the wrong package under the new name after we refactored parts of the CI. I think it should be fixed now, can you give it a try please?
Looping in the release manager. @geetachavan1 would we be able to patch the fix for libtensorflow and release new binaries with the correct CUDA capabilities. Happy to help get this done internally.
Glad I could help a little @motrek You should be able to link to python3.8 if you have the package
libpython3.8-devinstalled.For linking to the
_pywrap_tensorflowlibrary I just created a symlink to it in/usr/local/liband then ran
ldconfig. At which point it can be linked and also found at runtime.There’s a slightly cleaner solution to setting the “allow growth” option by including the experimental header
and then use the
TF_CreateConfighelper.Use the session options as normal.
No, the driver will be able to JIT
compute_70and use it for any compute capabilities 7.x. the startup may be slow, but it will work.