tensorflow: Debugger V2 not working. Invalid argument: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26

System information

I have used the test example from here
OS: Windows 10
Tensorflow 2.3.1 (installed with pip):
Python 3.6
CUDA 10.1
nVidia GeForce GTX 1050

I cannot make the example work with Debugger V2.

By executing the example from the link above I get the following output:

D:\src\ai\visualthing\venv\Scripts\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 50790 --file D:/src/ai/visualthing/debug_mnist_v2.py --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH

pydev debugger: process 8484 is connecting

Connected to pydev debugger (build 192.5728.105)
2020-09-27 20:31:08.451881: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
INFO:tensorflow:Enabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir, tensor debug mode: FULL_HEALTH)
I0927 20:31:11.284601  1260 dumping_callback.py:871] Enabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir, tensor debug mode: FULL_HEALTH)
2020-09-27 20:31:11.557685: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-09-27 20:31:11.584474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 computeCapability: 6.1
coreClock: 1.493GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2020-09-27 20:31:11.584652: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-27 20:31:11.588047: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-09-27 20:31:11.591169: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-09-27 20:31:11.592204: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-09-27 20:31:11.595773: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-09-27 20:31:11.597733: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-09-27 20:31:11.605092: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-27 20:31:11.605244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-27 20:31:11.605644: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-09-27 20:31:11.614513: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f4c545b410 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-27 20:31:11.614778: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-27 20:31:11.615119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 computeCapability: 6.1
coreClock: 1.493GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2020-09-27 20:31:11.615425: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-27 20:31:11.615585: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-09-27 20:31:11.615691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-09-27 20:31:11.615830: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-09-27 20:31:11.615921: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-09-27 20:31:11.616011: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-09-27 20:31:11.616099: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-27 20:31:11.616214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-27 20:31:12.188255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-27 20:31:12.188425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-27 20:31:12.188484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-09-27 20:31:12.188686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2987 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-09-27 20:31:12.191306: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f4e366a9f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-27 20:31:12.191431: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1050, Compute Capability 6.1
2020-09-27 20:31:13.537229: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 2060, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 2054, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 1405, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 1412, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/src/ai/visualthing/debug_mnist_v2.py", line 238, in <module>
    absl.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "D:\src\ai\visualthing\venv\lib\site-packages\absl\app.py", line 299, in run
    _run_main(main, args)
  File "D:\src\ai\visualthing\venv\lib\site-packages\absl\app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "D:/src/ai/visualthing/debug_mnist_v2.py", line 223, in main
    y = model(x_train)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 846, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1933, in _call_flat
    cancellation_manager=cancellation_manager)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\function.py", line 550, in call
    ctx=ctx)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 138, in execute_with_callbacks
    tensors = quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26
	 [[{{node StatefulPartitionedCall/MatMul/ReadVariableOp/DebugNumericSummaryV2}}]]
	 [[x/_1]]
  (1) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26
	 [[{{node StatefulPartitionedCall/MatMul/ReadVariableOp/DebugNumericSummaryV2}}]]
0 successful operations.
0 derived errors ignored. [Op:__forward_model_324]

Function call stack:
model -> model

INFO:tensorflow:Disabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir)
I0927 20:31:55.200698  1260 dumping_callback.py:895] Disabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir)

Process finished with exit code 1

I have also tried to build my own example with no success, same error: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53)

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 25 (4 by maintainers)

Commits related to this issue

[DebuggerV2] Enable debug_v2_ops_test & debug_events_writer_test on Windows - A test in debug_v2_ops_test previously called `np.power(2, 53)` without specifying dtype. As a result, the output had t... — committed to tensorflow/tensorflow by caisq 4 years ago
Change to NO_TENSOR to avoid this issue https://github.com/tensorflow/tensorflow/issues/43608 on Windows 10. — committed to Gavin-Development/GavinBackend by invalid-email-address 3 years ago

Most upvoted comments

I got the same error. You have got a fix?

MareSeestern on Dec 11, 2020

Same problem here. Searched all over for a solution and can’t find one. Any help would be appreciated.

chrisacc on Sep 30, 2020

Hi, I have just run into this issue with Tensorflow 2.9.1 and windows 10. A workaround was to set eager mode to true tf.config.run_functions_eagerly(True), I have no idea if it is anything of a good workaround though, but at least it runs with tensor_debug_mode='FULL_HEALTH'.

fabien-corso on Jul 20, 2022

I ran into the same issue on Windows 10 with tf 2.3.0

mjohenneken on Mar 25, 2021

I played around with the parameters. It seems that the debugger runs with the defaults. i.e. tf.debugging.experimental.enable_dump_debug_info( “tfdbg_logs”,tensor_debug_mode=“NO_TENSOR” ). But other options for the parameter tensor_debug_mode fail.

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:4156
	 [[node functional_1/batch_normalization_8/FusedBatchNormV3/ReadVariableOp_1/DebugNumericSummaryV2 (defined at C:\ProgramData\Anaconda3\envs\ml\lib\site-packages\wandb\integration\keras\keras.py:119) ]]
	 
[[broadcast_weights_1/assert_broadcastable/is_valid_shape/else/_486/broadcast_weights_1/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/then/_1492/broadcast_weights_1/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat/_2860]]
  (1) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:4156
	 [[node functional_1/batch_normalization_8/FusedBatchNormV3/ReadVariableOp_1/DebugNumericSummaryV2 (defined at C:\ProgramData\Anaconda3\envs\ml\lib\site-packag
```es\wandb\integration\keras\keras.py:119) ]]

mjohenneken on Mar 25, 2021

I solved my problem by removing this line: tf.debugging.experimental.enable_dump_debug_info(path, tensor_debug_mode="FULL_HEALTH", circular_buffer_size=-1)

AND restarting my Kernel after removing this line.

System information
OS: Windows 10
Tensorflow 2.3.0 (installed with pip):
Python 3.8
CUDA 10.1
nVidia GeForce GTX 1050

Yes, but now you don’t have debugging information. Am I right?

The problem is that we cannot use Debugger V2 on Windows 10. The whole purpose of this ticket is to figure out how to make it work. Of course, if you disable it the problem is gone 😄

jaimeff on Dec 16, 2020