tensorflow: InvalidArgumentError when running map_fn on strings inside a tf.function
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
- TensorFlow installed from (source or binary):
- TensorFlow version (use command below):
conda install tensorflow-gpu==2.0-alpha
- Python version: 3.7.1
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version:
cudatoolkit-10.0.130-0
cudnn-7.3.1-cuda10.0_0 - GPU model and memory: GeForce RTX 2080 Ti
Describe the current behavior
Running the provided code on GPUs leads to error message tensorflow.python.framework.errors_impl.InvalidArgumentError: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
Without feeding the tensor to the convolution layer, summary.image would succeed.
Describe the expected behavior Should run smoothly.
Code to reproduce the issue
import tensorflow as tf
from tensorflow.keras import layers
H, W, C = 10, 10, 3
imgs = tf.zeros([10, H, W, C])
ds = tf.data.Dataset.from_tensor_slices(imgs)
ds = ds.batch(2)
conv = layers.Conv2D(32, (4, 4), strides=(2, 2), padding='same')
@tf.function
def run(img, i):
conv(img)
tf.summary.image('img', img, i)
if __name__ == "__main__":
train_summary_writer = tf.summary.create_file_writer('/tmp/testsummary')
with train_summary_writer.as_default():
for i, img in enumerate(ds):
run(img, i)
Other info / logs
TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-04-20 14:44:30.818841: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1700000000 Hz
2019-04-20 14:44:30.819976: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x55b6fa788f50 executing computa
tions on platform Host. Devices:
2019-04-20 14:44:30.820029: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): <undefined>, <u
ndefined>
2019-04-20 14:44:30.825689: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic li
brary libcuda.so.1
2019-04-20 14:44:31.062487: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x55b6fc634120 executing computa
tions on platform CUDA. Devices:
2019-04-20 14:44:31.062554: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): GeForce RTX 208
0 Ti, Compute Capability 7.5
2019-04-20 14:44:31.063894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1467] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:19:00.0
totalMemory: 10.73GiB freeMemory: 10.57GiB
2019-04-20 14:44:31.063942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1546] Adding visible gpu devices: 0
2019-04-20 14:44:31.064034: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic li
brary libcudart.so.10.0
2019-04-20 14:44:31.067082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1015] Device interconnect StreamExecutor wi
th strength 1 edge matrix:
2019-04-20 14:44:31.067114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0
2019-04-20 14:44:31.067130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1034] 0: N
2019-04-20 14:44:31.068283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1149] Created TensorFlow device (/job:local
host/replica:0/task:0/device:GPU:0 with 10284 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id
: 0000:19:00.0, compute capability: 7.5)
2019-04-20 14:44:33.628228: W tensorflow/core/common_runtime/base_collective_executor.cc:214] BaseCollectiveExecutor::Star
tAbort Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
[[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
[[img_1/encode_each_image/while/loop_body_control/_19/_33]]
2019-04-20 14:44:33.628374: W tensorflow/core/common_runtime/base_collective_executor.cc:214] BaseCollectiveExecutor::Star
tAbort Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
[[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
2019-04-20 14:44:33.628468: E tensorflow/core/common_runtime/process_function_library_runtime.cc:764] Component function e
xecution failed: Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
[[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
[[img_1/encode_each_image/while/loop_body_control/_19/_33]]
2019-04-20 14:44:33.628456: E tensorflow/core/common_runtime/process_function_library_runtime.cc:764] Component function e
xecution failed: Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
[[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
Traceback (most recent call last):
File "test.py", line 21, in <module>
run(img, i)
File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/def_function.
py", line 438, in __call__
return self._stateless_fn(*args, **kwds)
File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/function.py",
line 1288, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/function.py",
line 574, in _filtered_call
(t for t in nest.flatten((args, kwargs))
File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/function.py",
line 627, in _call_flat
outputs = self._inference_function.call(ctx, args)
File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/function.py",
line 415, in call
ctx=ctx)
File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/execute.py",
line 66, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: During Variant Host->Device Copy: non-DMA-copy attempted of
tensor type: string
[[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
[[img_1/encode_each_image/while/loop_body_control/_19/_33]] [Op:__inference_run_343]
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 10
- Comments: 50 (18 by maintainers)
Commits related to this issue
- Add a repro for https://github.com/tensorflow/tensorflow/issues/28007 PiperOrigin-RevId: 298958458 Change-Id: I4eec5f9f582ace43710a5635028fc21efa3bdc24 — committed to tensorflow/tensorflow by saxenasaurabh 4 years ago
- Force CPU placement for ops that has DT_VARIANT inputs with host-only underlying data type. Fix for #28007 PiperOrigin-RevId: 301650148 Change-Id: I47fa9c1b0b7a7d56c5a519095687f36651892644 — committed to tensorflow/tensorflow by ezhulenev 4 years ago
- fix tf issue https://github.com/tensorflow/tensorflow/issues/28007 — committed to BlueFisher/Advanced-Soft-Actor-Critic by BlueFisher 4 years ago
sorry about my poor English. I have the same problem.But I found a solution. I’m using Nvidia 2080Ti , tf-nightly-gpu-2.0-preview. python3.7.3 ,ubuntu 19.04 When I used tf.summary.image(“gen”, generated_images, max_outputs=25, step=0), I got error :During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string. If I wrote like this:
everything is fine.
The issue should be renamed to something like: “
InvalidArgumentErrorwhen runningmap_fnon strings inside atf.function”.Here is an even smaller code snippet to reproduce the error (to run on GPU):
Well, it seems to be just a workaround to me. The main issue here is that the summary operation raises error when running on GPU. Forcing the operation to run on CPU doesn’t really solve the problem but just ignores the problem. I don’t know how summary operation works, probably even if running under GPU, it would still copy tensor back to CPU memory (which then would be similar to explicitly asking it to run on CPU). Even if this is the case (if not, we lose some efficiency), from an API point of view, I don’t think this issue is solved as someone might encounter the same problem and don’t know why it happen and how to solve it without bumping into this thread.
We’re looking into this now. Should have some updates soon.
I have the same error, but error raised when I use tensorflow serving(GPU VERSION)
my model incude function below:
It’s ok to delopy with serving cpu, but got error with gpu like below:
'{ "error": "2 root error(s) found.\\n (0) Invalid argument: 2 root error(s) found.\\n (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string\\n (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string\\n0 successful operations.\\n0 derived errors ignored.\\n\\t [[{{node model_11/lambda_16/map/TensorArrayUnstack/TensorListFromTensor}}]]\\n (1) Invalid argument: 2 root error(s) found.\\n (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string\\n (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string\\n0 successful operations.\\n0 derived errors ignored.\\n\\t [[{{node model_11/lambda_16/map/TensorArrayUnstack/TensorListFromTensor}}]]\\n\\t [[Func/StatefulPartitionedCall/StatefulPartitionedCall/model_11/lambda_16/map/while/body/_887/input/_935/_935]]\\n0 successful operations.\\n0 derived errors ignored." }'Any solutions for this?
I know the root cause for this issue is not in the tf.summary module, but for those who get here because of using tf.summary.image() inside @tf.function, my workaround is to return the outputs and do summaries outside:
@ipod825 I have the same problem (did try tf2.0 alphas and betas) and agree that assigning the summary op to /cpu:0 is only a workaround. Moreover, the fix is not working for me if I build from the r2.0 branch from source. It would be nice if this issue would be reopened, so the problem can be solved.
I took a look at github: the map_fn in line 75 is causing the issues.
Hi thanks this trick works for my case also
Also having the same issue using TF 2.1. Works fine on a machine with just a CPU, but fails on a machine with a GPU, even when using
with tf.device('/cpu:0'). Would appreciate an update this asap.i also have this problem, it works well in cpu ,but gpu not, how can I to solve it ?
Is this being fixed or adressed by someone. For me in TF2.0 not even the
tf.devicehint did work. Following code (tf2.0, cuda10.0, GTX 1080) did not work for me and failed with the same error message as reported above (Invalid argument: During Variant Host->Device Copy: non-DMA- …):@jvishnuvardhan I tried the notebook you posted, and it works because the notebook’s runtime isn’t using the GPU. After changing the runtime to a GPU-accelerated one, it fails with the error @ageron posted.
Actually, I just ran the Colab gist that I shared in my earlier post, but this time with a GPU runtime. I’m now seeing the same error message that you reported. So seems to be a GPU related issue. Can you open a new bug with all of this information? Thanks!
Closing this issue now since the bug has been fixed.
is there a plan to fix this or a suggested workaround?
I also ran into this issue. Here’s a fairly minimal piece of code that reproduces it:
and here’s the full stackstrace:
I ran this on Colab with a GPU Runtime, using TF 1.15.0rc3. It will probably bomb as well on TF 2.0.0 but I haven’t tried.
@nikitamaia Yes, this is fixed on TensorFlow 2.3.0. I tested this on the colab and also on my local machine (running TF 2.3.0 on Arch Linux).
I am also facing this issue on GPU (no error on CPU) when using to map_fn for string tensor (with float tensor all right):
I do not know why, but for me manually placing map_fn on cpu AND making mock return from tf.function helped:
I am using tf2.0.1
This should be fixed for simple
tf.map_fnexample, however the underlying problem is still there, and might be triggered in more complex use cases. A fix commit has a repro with explanation.I think the issue is incorrectly assigned as it is not directly related to
tf.summary. @tensorflow/dev-support Can this be reassigned to someone working on functional ops such astf.map_fn?@rharish101 Thanks! Got it. This is not resolved. Thanks!