tensorflow: failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE

Issue Type

Bug

Source

binary

Tensorflow Version

v2.9.0-18-gd8ce9f9c301 2.9.1

Custom Code

No

OS Platform and Distribution

Linux Ubuntu 20.04.4 LTS

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

I have a dynamic keras.Model named symbol_net. When executing forward computation (call call method), sometimes it crashes as follows if there’s a Dense layer in the model.

I have searched on the Internet and tries so many solutions including combining them, like

import tensorflow as tf  # type: ignore
from tensorflow import keras
from keras import layers  # type: ignore
from keras import backend as K
physical_devices = tf.config.list_physical_devices("GPU")
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.333
session = tf.compat.v1.Session(config=config)
K.set_session(session)

But all of them don’t work. I have a GPU with 12 GiB. On the multi-user machine, when I was running the code, there remains 12000 MiB for me, so it’s enough. My model is quite small, like this , which won’t take a lot of mem.

2022-08-21 23:09:42.546282: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:42.546307: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:42.546320: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(2, 2, 2, 2) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
  File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
    ic(net(*input_list))
  File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).

Graph execution error:

Detected at node 'dense/Tensordot/MatMul' defined at (most recent call last):
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
      ic(net(*input_list))
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
      for inst, inps, outs, op, node_id in self.instructions.data:
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
      outputs = inst(*input_tensors)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 224, in call
      outputs = tf.tensordot(inputs, self.kernel, [[rank - 1], [0]])
Node: 'dense/Tensordot/MatMul'
Failed initializing math mode
	 [[{{node dense/Tensordot/MatMul}}]] [Op:__inference_call_146]

Call arguments received by layer "symbol_net" (type SymbolNet):
  • args=('tf.Tensor(shape=(2, 2, 2, 2), dtype=float32)', 'tf.Tensor(shape=(1, 1, 1, 1), dtype=float32)')
  • kwargs={'training': 'None'}

Standalone code to reproduce the issue

Currently my code is large. Sorry.

Relevant log output

2022-08-21 23:09:55.580410: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601460: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601638: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.602081: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-21 23:09:55.603250: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603399: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915925: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.068318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068541: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068654: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068904: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.183640: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184001: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184083: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6

2022-08-21 23:09:57.669085: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:57.669107: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:57.669119: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(1, 1) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
  File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
    ic(net(*input_list))
  File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).

Graph execution error:

Detected at node 'dense/MatMul' defined at (most recent call last):
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
      ic(net(*input_list))
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
      for inst, inps, outs, op, node_id in self.instructions.data:
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
      outputs = inst(*input_tensors)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 221, in call
      outputs = tf.matmul(a=inputs, b=self.kernel)
Node: 'dense/MatMul'
Failed initializing math mode
	 [[{{node dense/MatMul}}]] [Op:__inference_call_156]

Call arguments received by layer "symbol_net" (type SymbolNet):
  • args=('tf.Tensor(shape=(2, 2, 2, 1), dtype=float32)', 'tf.Tensor(shape=(1,), dtype=float32)')
  • kwargs={'training': 'None'}

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 3
  • Comments: 17 (3 by maintainers)

Most upvoted comments

Actually, just importing tensorflow before I import torchaudio fixed the problem! It makes me a little worried about other possible compatibility issues between torchaudio and tensorflow though.

On Sun, Aug 28, 2022 at 7:58 AM Colin @.***> wrote:

@jhuus https://github.com/jhuus Could you try tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/57359#issuecomment-1229441900, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR43UYH2XPY3RWVPWE3ZM53V3NH75ANCNFSM57GFHK4A . You are receiving this because you were mentioned.Message ID: @.***>

@jhuus Could you try tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time. And temporarily you don’t need to wait for this issue being fixed.

@sushreebarsa Hi! I am wondering if it’s better to output a more friendly error message for this assertion error? Only logging

Node: 'dense/MatMul'
Failed initializing math mode
	 [[{{node dense/MatMul}}]] [Op:__inference_call_156]

is quite confusing. If it’s ok, I would like to add some extra information here., like:

Please check if there's some conflicts, like another deep learning framework (e.g. torch) is imported.
Or consider to disable TF32 optimization by `tf.config.experimental.enable_tensor_float_32_execution(False)`.

@Co1lin Thank you for the update! Please move this issue to closed status if it is resolved for you? Thank you!