tensorflow: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid?
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): v2.3.0-rc2-23-gb36436b087 2.3.0
- Python version: 3.8.2
- CUDA/cuDNN version: Cuda 10.1/ cuDNN 7.6.5
- GPU model and memory: Nvidia GTX 750Ti
- Exact command to reproduce:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
# At this step I was getting the error which I've posted below in the terminal.
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
Describe the problem
I’ve recently installed ubuntu 20.04 LTS and it comes with python-3.8, so I’ll installed nvidia-cuda-toolkit and nvidia drivers and I can confirm they are working fine.
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
$ nvidia-smi
Mon Aug 3 02:56:11 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti Off | 00000000:01:00.0 On | N/A |
| 27% 38C P0 1W / 38W | 245MiB / 1997MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 979 G /usr/lib/xorg/Xorg 20MiB |
+-----------------------------------------------------------------------------+
Now, I tried to build a small sequential model I am getting an error which says InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
I don’t know what causing the issue. My linux ubuntu is a new installation. I have installed everything correctly.
Source code / logs
2020-08-03 02:48:40.720575: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-03 02:48:40.750630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 750 Ti computeCapability: 5.0
coreClock: 1.137GHz coreCount: 5 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 80.47GiB/s
2020-08-03 02:48:40.750735: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-03 02:48:40.791690: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-03 02:48:40.815993: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-03 02:48:40.821924: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-03 02:48:40.863910: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-03 02:48:40.870559: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-03 02:48:40.945916: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-08-03 02:48:40.947130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-08-03 02:48:40.979471: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3311130000 Hz
2020-08-03 02:48:40.980123: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4aa6700 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-03 02:48:40.980190: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-03 02:48:41.121266: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x49375f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-03 02:48:41.121357: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 750 Ti, Compute Capability 5.0
2020-08-03 02:48:41.122574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 750 Ti computeCapability: 5.0
coreClock: 1.137GHz coreCount: 5 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 80.47GiB/s
2020-08-03 02:48:41.122676: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-03 02:48:41.122762: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-03 02:48:41.122830: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-03 02:48:41.122898: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-03 02:48:41.122963: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-03 02:48:41.123029: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-03 02:48:41.123145: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-08-03 02:48:41.124618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-08-03 02:48:41.124716: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
<ipython-input-4-ac4dc71cdd20> in <module>
----> 1 model = keras.Sequential([
2 keras.layers.Flatten(input_shape=(28, 28)),
3 keras.layers.Dense(128, activation='relu'),
4 keras.layers.Dense(10)
5 ])
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py in _method_wrapper(self, *args, **kwargs)
455 self._self_setattr_tracking = False # pylint: disable=protected-access
456 try:
--> 457 result = method(self, *args, **kwargs)
458 finally:
459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/keras/engine/sequential.py in __init__(self, layers, name)
114 """
115 # Skip the init in FunctionalModel since model doesn't have input/output yet
--> 116 super(functional.Functional, self).__init__( # pylint: disable=bad-super-call
117 name=name, autocast=False)
118 self.supports_masking = True
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py in _method_wrapper(self, *args, **kwargs)
455 self._self_setattr_tracking = False # pylint: disable=protected-access
456 try:
--> 457 result = method(self, *args, **kwargs)
458 finally:
459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in __init__(self, *args, **kwargs)
306 self._steps_per_execution = None
307
--> 308 self._init_batch_counters()
309 self._base_model_initialized = True
310 _keras_api_gauge.get_cell('model').set(True)
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py in _method_wrapper(self, *args, **kwargs)
455 self._self_setattr_tracking = False # pylint: disable=protected-access
456 try:
--> 457 result = method(self, *args, **kwargs)
458 finally:
459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in _init_batch_counters(self)
315 # `evaluate`, and `predict`.
316 agg = variables.VariableAggregationV2.ONLY_FIRST_REPLICA
--> 317 self._train_counter = variables.Variable(0, dtype='int64', aggregation=agg)
318 self._test_counter = variables.Variable(0, dtype='int64', aggregation=agg)
319 self._predict_counter = variables.Variable(
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
260 return cls._variable_v1_call(*args, **kwargs)
261 elif cls is Variable:
--> 262 return cls._variable_v2_call(*args, **kwargs)
263 else:
264 return super(VariableMetaclass, cls).__call__(*args, **kwargs)
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/ops/variables.py in _variable_v2_call(cls, initial_value, trainable, validate_shape, caching_device, name, variable_def, dtype, import_scope, constraint, synchronization, aggregation, shape)
242 if aggregation is None:
243 aggregation = VariableAggregation.NONE
--> 244 return previous_getter(
245 initial_value=initial_value,
246 trainable=trainable,
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/ops/variables.py in <lambda>(**kws)
235 shape=None):
236 """Call on Variable class. Useful to force the signature."""
--> 237 previous_getter = lambda **kws: default_variable_creator_v2(None, **kws)
238 for _, getter in ops.get_default_graph()._variable_creator_stack: # pylint: disable=protected-access
239 previous_getter = _make_getter(getter, previous_getter)
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py in default_variable_creator_v2(next_creator, **kwargs)
2631 shape = kwargs.get("shape", None)
2632
-> 2633 return resource_variable_ops.ResourceVariable(
2634 initial_value=initial_value,
2635 trainable=trainable,
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
262 return cls._variable_v2_call(*args, **kwargs)
263 else:
--> 264 return super(VariableMetaclass, cls).__call__(*args, **kwargs)
265
266
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py in __init__(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, variable_def, import_scope, constraint, distribute_strategy, synchronization, aggregation, shape)
1505 self._init_from_proto(variable_def, import_scope=import_scope)
1506 else:
-> 1507 self._init_from_args(
1508 initial_value=initial_value,
1509 trainable=trainable,
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py in _init_from_args(self, initial_value, trainable, collections, caching_device, name, dtype, constraint, synchronization, aggregation, distribute_strategy, shape)
1648 with ops.get_default_graph()._attr_scope({"_class": attr}):
1649 with ops.name_scope("Initializer"), device_context_manager(None):
-> 1650 initial_value = ops.convert_to_tensor(
1651 initial_value() if init_from_fn else initial_value,
1652 name="initial_value", dtype=dtype)
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
1497
1498 if ret is None:
-> 1499 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1500
1501 if ret is NotImplemented:
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/framework/tensor_conversion_registry.py in _default_conversion_function(***failed resolving arguments***)
50 def _default_conversion_function(value, dtype, name, as_ref):
51 del as_ref # Unused.
---> 52 return constant_op.constant(value, dtype, name=name)
53
54
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
261 ValueError: if called on a symbolic tensor.
262 """
--> 263 return _constant_impl(value, dtype, shape, name, verify_shape=False,
264 allow_broadcast=True)
265
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
273 with trace.Trace("tf.constant"):
274 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 275 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
276
277 g = ops.get_default_graph()
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
298 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
299 """Implementation of eager constant."""
--> 300 t = convert_to_eager_tensor(value, ctx, dtype)
301 if shape is None:
302 return t
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
95 except AttributeError:
96 dtype = dtypes.as_dtype(dtype).as_datatype_enum
---> 97 ctx.ensure_initialized()
98 return ops.EagerTensor(value, ctx.device_name, dtype)
99
/mnt/Work/work_env/lib/python3.8/site-packages/tensorflow/python/eager/context.py in ensure_initialized(self)
537 if self._use_tfrt is not None:
538 pywrap_tfe.TFE_ContextOptionsSetTfrt(opts, self._use_tfrt)
--> 539 context_handle = pywrap_tfe.TFE_NewContext(opts)
540 finally:
541 pywrap_tfe.TFE_DeleteContextOptions(opts)
InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 25
- Comments: 41 (11 by maintainers)
Yep, my read is that some summer intern thought it was a good idea to not support old hardware anymore to reduce the pip binary file size and to make some metrics like tensorflow average startup time go down (TF2 is a lot slower than TF1 on those old GPUs).
In https://github.com/tensorflow/tensorflow/releases/tag/v2.3.0
Now it’s September, holidays should be over even though work from home is probably still in place, and a lot of people will update and discover that it screws them over one way or the other. Hopefully someone at tensorflow will take the helm back, and turn the flag back on. It’s incredibly short sighted, to make a non-backward compatible breaking change that will prevent a significant fraction of users, to use tensorflow at all.
Staying with an old tensorflow version is a no go because you can’t use the latest algorithms like @elvis1020 is showing.
I understand that some operation may benefit from the newer compute capabilities but it shouldn’t prevent glorified matrix multiplications from running.
I use my laptop as my development machine because the machines with powerful GPUs are already running. If I can’t have the same version of tensorflow on development and production it’s a deal braker for using tensorflow.
Also my old laptops are reused as robot brains/passive monitoring tools so if I can’t run tensorflow on them they become useless, so deal breaker for using tensorflow.
For information it’s kind of critical for me and will make me migrate all my code toward torch within a month if nothing is changed.
TF 2.3 doesn’t work with my laptop’s GPU “GeForce GTX 960M” which is compute capability 5.0 TF 2.2 works though. I’m not compiling from source. Guess it’s time to move to torch.
Adding the following:
Solved the problem for me
solved this error on RTX 3070 with following specs: CUDA: 11.0 CUDNN: 8 Tensorflow: 2.4
I started having this same issue today. Getting the same error using Keras with Nvidia RTX 2080 Super. Ubuntu 20.4
Hi everyone,
TF nightly starting from the latest nightly build should now work on GPUs with compute capability 5.0 like GeForce GTX 960M. Please give it a try and let us know how it goes.
@abhipn Here are the installation instruction on TF website. Thanks!
I just close the terminal and open again, it works on me.
Hello, I am facing the same issue. I am runnin tf2.2.0 on nvidia container, i comes with cuda 11 by default, but i am not being able to recognize the gpu in the code. Now It brings me the following error (when it is trying to load mtcnn): tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
@unrealwill if you’re running on linux you can use TF docker images to avoid having to deal with CUDA and cuDNN versions.
I have a Quadro M1200 which has compute capability 5 and I am still getting this same error.
Hi @abhipn,
You can enter the compute capability when the configure script says:
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]:
(Enter 5.0, and possibly others if needed.)As for compile time, I don’t think it will build quickly (although targeting just 5.0 should cut down on the compile time); I recommend building it on a beefy GCP VM for a quicker turnaround.
@abhipn we no longer ship PTX for some older GPUs, which includes the one you’re using. The easiest solution for you is probably to build the pip package from source with support for your GPU (sm_50) included. See https://www.tensorflow.org/install/gpu#hardware_requirements.
@abhipn Looks like the GPU was detected but it loads some libraries (like
libcudart.so.10.1
) are loaded fromCUDA10.1
and some other libraries (for examplelibcublas.so.10
,libcusolver.so.10
etc) are loaded fromCUDA10.0
. Please check your error trace.Can you please uninstall CUDA drivers (10.1 and 10.0), unistall TF, restart, install CUDA10.1 and finally install TF. Please let me know how to progresses. Thanks!
I have recently purchased a 3090 and I have build
tensorflow==2.3.0
from source withcuda 11.1
andcudnn 8.0.4
, I don’t have any issues. If you don’t want to build from source, maybe you can try thetensorflow 2.4.0 nightly
and see if it works?Running the above command today after a fresh docker pull makes the bug disappear
Thanks
Addentum : I couldn’t help but notice the time is not my local time and my usual trick
docker -e TZ=Europe/Paris ...
doesn’t seem to have an effectmy tensorflow work very well. I update all the packages of my python and after that getting same issue. what should I do to fix this?
I have the same issue today, and find nothing to solve it, have you solve it yet?