tensorflow: failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE

Issue Type

Bug

Source

binary

Tensorflow Version

v2.9.0-18-gd8ce9f9c301 2.9.1

Custom Code

OS Platform and Distribution

Linux Ubuntu 20.04.4 LTS

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

I have a dynamic keras.Model named symbol_net. When executing forward computation (call call method), sometimes it crashes as follows if there’s a Dense layer in the model.

I have searched on the Internet and tries so many solutions including combining them, like

import tensorflow as tf  # type: ignore
from tensorflow import keras
from keras import layers  # type: ignore
from keras import backend as K
physical_devices = tf.config.list_physical_devices("GPU")
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.333
session = tf.compat.v1.Session(config=config)
K.set_session(session)

But all of them don’t work. I have a GPU with 12 GiB. On the multi-user machine, when I was running the code, there remains 12000 MiB for me, so it’s enough. My model is quite small, like this , which won’t take a lot of mem.

2022-08-21 23:09:42.546282: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:42.546307: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:42.546320: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(2, 2, 2, 2) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
  File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
    ic(net(*input_list))
  File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).

Graph execution error:

Detected at node 'dense/Tensordot/MatMul' defined at (most recent call last):
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
      ic(net(*input_list))
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
      for inst, inps, outs, op, node_id in self.instructions.data:
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
      outputs = inst(*input_tensors)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 224, in call
      outputs = tf.tensordot(inputs, self.kernel, [[rank - 1], [0]])
Node: 'dense/Tensordot/MatMul'
Failed initializing math mode
	 [[{{node dense/Tensordot/MatMul}}]] [Op:__inference_call_146]

Call arguments received by layer "symbol_net" (type SymbolNet):
  • args=('tf.Tensor(shape=(2, 2, 2, 2), dtype=float32)', 'tf.Tensor(shape=(1, 1, 1, 1), dtype=float32)')
  • kwargs={'training': 'None'}

Standalone code to reproduce the issue

Currently my code is large. Sorry.

Relevant log output

2022-08-21 23:09:55.580410: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601460: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601638: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.602081: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-21 23:09:55.603250: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603399: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915925: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.068318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068541: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068654: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068904: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.183640: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184001: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184083: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6

2022-08-21 23:09:57.669085: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:57.669107: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:57.669119: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(1, 1) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
  File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
    ic(net(*input_list))
  File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).

Graph execution error:

Detected at node 'dense/MatMul' defined at (most recent call last):
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
      ic(net(*input_list))
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
      for inst, inps, outs, op, node_id in self.instructions.data:
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
      outputs = inst(*input_tensors)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 221, in call
      outputs = tf.matmul(a=inputs, b=self.kernel)
Node: 'dense/MatMul'
Failed initializing math mode
	 [[{{node dense/MatMul}}]] [Op:__inference_call_156]

Call arguments received by layer "symbol_net" (type SymbolNet):
  • args=('tf.Tensor(shape=(2, 2, 2, 1), dtype=float32)', 'tf.Tensor(shape=(1,), dtype=float32)')
  • kwargs={'training': 'None'}

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 3
Comments: 17 (3 by maintainers)

Most upvoted comments

Actually, just importing tensorflow before I import torchaudio fixed the problem! It makes me a little worried about other possible compatibility issues between torchaudio and tensorflow though.

On Sun, Aug 28, 2022 at 7:58 AM Colin @.***> wrote:

@jhuus https://github.com/jhuus Could you try tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/57359#issuecomment-1229441900, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR43UYH2XPY3RWVPWE3ZM53V3NH75ANCNFSM57GFHK4A . You are receiving this because you were mentioned.Message ID: @.***>

jhuus on Aug 28, 2022

@jhuus Could you try tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time. And temporarily you don’t need to wait for this issue being fixed.

Co1lin on Aug 28, 2022

@sushreebarsa Hi! I am wondering if it’s better to output a more friendly error message for this assertion error? Only logging

Node: 'dense/MatMul'
Failed initializing math mode
	 [[{{node dense/MatMul}}]] [Op:__inference_call_156]

is quite confusing. If it’s ok, I would like to add some extra information here., like:

Please check if there's some conflicts, like another deep learning framework (e.g. torch) is imported.
Or consider to disable TF32 optimization by `tf.config.experimental.enable_tensor_float_32_execution(False)`.

Co1lin on Aug 28, 2022

@Co1lin Thank you for the update! Please move this issue to closed status if it is resolved for you? Thank you!

sushreebarsa on Aug 28, 2022