tensorflow: failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
Issue Type
Bug
Source
binary
Tensorflow Version
v2.9.0-18-gd8ce9f9c301 2.9.1
Custom Code
No
OS Platform and Distribution
Linux Ubuntu 20.04.4 LTS
Mobile device
No response
Python version
No response
Bazel version
No response
GCC/Compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current Behaviour?
I have a dynamic keras.Model named symbol_net. When executing forward computation (call call method), sometimes it crashes as follows if there’s a Dense layer in the model.
I have searched on the Internet and tries so many solutions including combining them, like
import tensorflow as tf # type: ignore
from tensorflow import keras
from keras import layers # type: ignore
from keras import backend as K
physical_devices = tf.config.list_physical_devices("GPU")
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.333
session = tf.compat.v1.Session(config=config)
K.set_session(session)
But all of them don’t work. I have a GPU with 12 GiB. On the multi-user machine, when I was running the code, there remains 12000 MiB for me, so it’s enough. My model is quite small, like this , which won’t take a lot of mem.
2022-08-21 23:09:42.546282: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:42.546307: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:42.546320: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
outputs= (shape=(2, 2, 2, 2) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
ic(net(*input_list))
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).
Graph execution error:
Detected at node 'dense/Tensordot/MatMul' defined at (most recent call last):
File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
ic(net(*input_list))
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
return super().__call__(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
for inst, inps, outs, op, node_id in self.instructions.data:
File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
outputs = inst(*input_tensors)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 224, in call
outputs = tf.tensordot(inputs, self.kernel, [[rank - 1], [0]])
Node: 'dense/Tensordot/MatMul'
Failed initializing math mode
[[{{node dense/Tensordot/MatMul}}]] [Op:__inference_call_146]
Call arguments received by layer "symbol_net" (type SymbolNet):
• args=('tf.Tensor(shape=(2, 2, 2, 2), dtype=float32)', 'tf.Tensor(shape=(1, 1, 1, 1), dtype=float32)')
• kwargs={'training': 'None'}
Standalone code to reproduce the issue
Currently my code is large. Sorry.
Relevant log output
2022-08-21 23:09:55.580410: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601460: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601638: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.602081: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-21 23:09:55.603250: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603399: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915925: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.068318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068541: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068654: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068904: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.183640: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184001: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184083: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:57.669085: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:57.669107: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:57.669119: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
outputs= (shape=(1, 1) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
ic(net(*input_list))
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).
Graph execution error:
Detected at node 'dense/MatMul' defined at (most recent call last):
File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
ic(net(*input_list))
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
return super().__call__(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
for inst, inps, outs, op, node_id in self.instructions.data:
File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
outputs = inst(*input_tensors)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 221, in call
outputs = tf.matmul(a=inputs, b=self.kernel)
Node: 'dense/MatMul'
Failed initializing math mode
[[{{node dense/MatMul}}]] [Op:__inference_call_156]
Call arguments received by layer "symbol_net" (type SymbolNet):
• args=('tf.Tensor(shape=(2, 2, 2, 1), dtype=float32)', 'tf.Tensor(shape=(1,), dtype=float32)')
• kwargs={'training': 'None'}
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 3
- Comments: 17 (3 by maintainers)
Actually, just importing tensorflow before I import torchaudio fixed the problem! It makes me a little worried about other possible compatibility issues between torchaudio and tensorflow though.
On Sun, Aug 28, 2022 at 7:58 AM Colin @.***> wrote:
@jhuus Could you try
tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time. And temporarily you don’t need to wait for this issue being fixed.@sushreebarsa Hi! I am wondering if it’s better to output a more friendly error message for this assertion error? Only logging
is quite confusing. If it’s ok, I would like to add some extra information here., like:
@Co1lin Thank you for the update! Please move this issue to closed status if it is resolved for you? Thank you!