tensorflow: InvalidArgumentError when running map_fn on strings inside a tf.function

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below):
conda install tensorflow-gpu==2.0-alpha
  • Python version: 3.7.1
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: cudatoolkit-10.0.130-0
    cudnn-7.3.1-cuda10.0_0
  • GPU model and memory: GeForce RTX 2080 Ti

Describe the current behavior Running the provided code on GPUs leads to error message tensorflow.python.framework.errors_impl.InvalidArgumentError: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string Without feeding the tensor to the convolution layer, summary.image would succeed.

Describe the expected behavior Should run smoothly.

Code to reproduce the issue

import tensorflow as tf
from tensorflow.keras import layers

H, W, C = 10, 10, 3
imgs = tf.zeros([10, H, W, C])
ds = tf.data.Dataset.from_tensor_slices(imgs)
ds = ds.batch(2)
conv = layers.Conv2D(32, (4, 4), strides=(2, 2), padding='same')


@tf.function
def run(img, i):
    conv(img)
    tf.summary.image('img', img, i)


if __name__ == "__main__":
    train_summary_writer = tf.summary.create_file_writer('/tmp/testsummary')
    with train_summary_writer.as_default():
        for i, img in enumerate(ds):
            run(img, i)

Other info / logs

TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-04-20 14:44:30.818841: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1700000000 Hz
2019-04-20 14:44:30.819976: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x55b6fa788f50 executing computa
tions on platform Host. Devices:
2019-04-20 14:44:30.820029: I tensorflow/compiler/xla/service/service.cc:169]   StreamExecutor device (0): <undefined>, <u
ndefined>
2019-04-20 14:44:30.825689: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic li
brary libcuda.so.1
2019-04-20 14:44:31.062487: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x55b6fc634120 executing computa
tions on platform CUDA. Devices:
2019-04-20 14:44:31.062554: I tensorflow/compiler/xla/service/service.cc:169]   StreamExecutor device (0): GeForce RTX 208
0 Ti, Compute Capability 7.5
2019-04-20 14:44:31.063894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1467] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:19:00.0
totalMemory: 10.73GiB freeMemory: 10.57GiB
2019-04-20 14:44:31.063942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1546] Adding visible gpu devices: 0
2019-04-20 14:44:31.064034: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic li
brary libcudart.so.10.0
2019-04-20 14:44:31.067082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1015] Device interconnect StreamExecutor wi
th strength 1 edge matrix:
2019-04-20 14:44:31.067114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021]      0
2019-04-20 14:44:31.067130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1034] 0:   N
2019-04-20 14:44:31.068283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1149] Created TensorFlow device (/job:local
host/replica:0/task:0/device:GPU:0 with 10284 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id
: 0000:19:00.0, compute capability: 7.5)
2019-04-20 14:44:33.628228: W tensorflow/core/common_runtime/base_collective_executor.cc:214] BaseCollectiveExecutor::Star
tAbort Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
         [[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
         [[img_1/encode_each_image/while/loop_body_control/_19/_33]]
2019-04-20 14:44:33.628374: W tensorflow/core/common_runtime/base_collective_executor.cc:214] BaseCollectiveExecutor::Star
tAbort Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
         [[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
2019-04-20 14:44:33.628468: E tensorflow/core/common_runtime/process_function_library_runtime.cc:764] Component function e
xecution failed: Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
         [[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
         [[img_1/encode_each_image/while/loop_body_control/_19/_33]]
2019-04-20 14:44:33.628456: E tensorflow/core/common_runtime/process_function_library_runtime.cc:764] Component function e
xecution failed: Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
         [[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    run(img, i)
  File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/def_function.
py", line 438, in __call__
    return self._stateless_fn(*args, **kwds)
  File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/function.py",
 line 1288, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/function.py",
 line 574, in _filtered_call
    (t for t in nest.flatten((args, kwargs))
  File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/function.py",
 line 627, in _call_flat
    outputs = self._inference_function.call(ctx, args)
  File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/function.py",
 line 415, in call
    ctx=ctx)
  File "/home/swang150/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/tensorflow/python/eager/execute.py",
line 66, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: During Variant Host->Device Copy: non-DMA-copy attempted of
tensor type: string
         [[{{node img_1/encode_each_image/while/body/_1/TensorArrayV2Write/TensorListSetItem/_54}}]]
         [[img_1/encode_each_image/while/loop_body_control/_19/_33]] [Op:__inference_run_343]

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 10
  • Comments: 50 (18 by maintainers)

Commits related to this issue

Most upvoted comments

sorry about my poor English. I have the same problem.But I found a solution. I’m using Nvidia 2080Ti , tf-nightly-gpu-2.0-preview. python3.7.3 ,ubuntu 19.04 When I used tf.summary.image(“gen”, generated_images, max_outputs=25, step=0), I got error :During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string. If I wrote like this:

with tf.device("cpu:0"): <<-- add this line
   with log["writer"].as_default():
     tf.summary.image("gen", generated_images, max_outputs=25, step=0)

everything is fine.

The issue should be renamed to something like: “InvalidArgumentError when running map_fn on strings inside a tf.function”.

Here is an even smaller code snippet to reproduce the error (to run on GPU):

import tensorflow as tf

@tf.function
def f():
    return tf.map_fn(tf.strings.upper, tf.constant(["a", "b", "c"]))

print(f())

Well, it seems to be just a workaround to me. The main issue here is that the summary operation raises error when running on GPU. Forcing the operation to run on CPU doesn’t really solve the problem but just ignores the problem. I don’t know how summary operation works, probably even if running under GPU, it would still copy tensor back to CPU memory (which then would be similar to explicitly asking it to run on CPU). Even if this is the case (if not, we lose some efficiency), from an API point of view, I don’t think this issue is solved as someone might encounter the same problem and don’t know why it happen and how to solve it without bumping into this thread.

We’re looking into this now. Should have some updates soon.

I have the same error, but error raised when I use tensorflow serving(GPU VERSION)

my model incude function below:

def preprocess_and_decode(img_str, new_shape=target_size):
        img = tf.io.decode_base64(img_str)
        img = tf.image.decode_jpeg(img, channels=3)
        img = tf.image.resize(img, new_shape, method=method)
        return img

input64 = tf.keras.layers.Input(shape=(1,), dtype="string", name=input_name)
ouput_tensor = tf.keras.layers.Lambda(
        lambda img: tf.map_fn(lambda im: preprocess_and_decode(im[0]), img, dtype="float32"))(input64)

It’s ok to delopy with serving cpu, but got error with gpu like below:

'{ "error": "2 root error(s) found.\\n (0) Invalid argument: 2 root error(s) found.\\n (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string\\n (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string\\n0 successful operations.\\n0 derived errors ignored.\\n\\t [[{{node model_11/lambda_16/map/TensorArrayUnstack/TensorListFromTensor}}]]\\n (1) Invalid argument: 2 root error(s) found.\\n (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string\\n (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string\\n0 successful operations.\\n0 derived errors ignored.\\n\\t [[{{node model_11/lambda_16/map/TensorArrayUnstack/TensorListFromTensor}}]]\\n\\t [[Func/StatefulPartitionedCall/StatefulPartitionedCall/model_11/lambda_16/map/while/body/_887/input/_935/_935]]\\n0 successful operations.\\n0 derived errors ignored." }'

Any solutions for this?

I know the root cause for this issue is not in the tf.summary module, but for those who get here because of using tf.summary.image() inside @tf.function, my workaround is to return the outputs and do summaries outside:

@tf.function
def train_op(inputs):
  outputs = net(inputs)
  # handle loss and gradients...
  return outputs

def train():
  for data in dataloader:
    outputs = train_op(data)
    with summary_writer.as_default():
      tf.summary.image('image', outputs)

@ipod825 I have the same problem (did try tf2.0 alphas and betas) and agree that assigning the summary op to /cpu:0 is only a workaround. Moreover, the fix is not working for me if I build from the r2.0 branch from source. It would be nice if this issue would be reopened, so the problem can be solved.

I took a look at github: the map_fn in line 75 is causing the issues.

I am also facing this issue on GPU (no error on CPU) when using to map_fn for string tensor (with float tensor all right):

def process_string(sample):
    # here i want to write something to file, but even with identity
    # function I get "non-DMA-copy attempted..." error
    return sample

@tf.function
def f(self, y_true, y_pred):
    string_tensor = y_true["path"]  # Tensor("data_batch_19:0", shape=(2,), dtype=string)
    tf.map_fn(process_string, string_tensor)

I do not know why, but for me manually placing map_fn on cpu AND making mock return from tf.function helped:

@tf.function
def f(self, y_true, y_pred):
    string_tensor = y_true["path"]  # Tensor("data_batch_19:0", shape=(2,), dtype=string)
    with tf.device("/cpu:0"):
        tf.map_fn(process_string, string_tensor)
    return y_pred  # do not work for me without return something

I am using tf2.0.1

Hi thanks this trick works for my case also

Also having the same issue using TF 2.1. Works fine on a machine with just a CPU, but fails on a machine with a GPU, even when using with tf.device('/cpu:0'). Would appreciate an update this asap.

The issue should be renamed to something like: “InvalidArgumentError when running map_fn on strings inside a tf.function”.

Here is an even smaller code snippet to reproduce the error (to run on GPU):

import tensorflow as tf

@tf.function
def f():
    return tf.map_fn(tf.strings.upper, tf.constant(["a", "b", "c"]))

print(f())

i also have this problem, it works well in cpu ,but gpu not, how can I to solve it ?

Is this being fixed or adressed by someone. For me in TF2.0 not even the tf.device hint did work. Following code (tf2.0, cuda10.0, GTX 1080) did not work for me and failed with the same error message as reported above (Invalid argument: During Variant Host->Device Copy: non-DMA- …):

import tensorflow as tf

writer = tf.summary.create_file_writer("/tmp/mylogs/tf_function")


@tf.function
def my_func(image, step):
    with tf.device("/cpu:0"):
        tf.summary.image("my_image_metric", image, step=step)


image = tf.constant(
    [[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]], [[0.0, 0.0, 1.0], [1.0, 1.0, 1.0]]]
)[None]

with writer.as_default():
    for step in tf.range(100, dtype=tf.int64):
        my_func(image, step)
        writer.flush()

@jvishnuvardhan I tried the notebook you posted, and it works because the notebook’s runtime isn’t using the GPU. After changing the runtime to a GPU-accelerated one, it fails with the error @ageron posted.

Actually, I just ran the Colab gist that I shared in my earlier post, but this time with a GPU runtime. I’m now seeing the same error message that you reported. So seems to be a GPU related issue. Can you open a new bug with all of this information? Thanks!

Closing this issue now since the bug has been fixed.

is there a plan to fix this or a suggested workaround?

I also ran into this issue. Here’s a fairly minimal piece of code that reproduces it:

import tensorflow.compat.v2 as tf
tf.enable_v2_behavior()

def decode_png(data):
  return tf.image.decode_png(data)

@tf.function  # <= No exception if you comment this line out
def decode_all(images):
  return tf.map_fn(decode_png, images, dtype=tf.uint8)

img = b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15\xc4\x89\x00\x00\x00\rIDATx\xdac\xfc\xcf\xf0\xbf\x1e\x00\x06\x83\x02\x7f\x94\xad\xd0\xeb\x00\x00\x00\x00IEND\xaeB`\x82'
images = tf.constant([img, img])
decode_all(images)

and here’s the full stackstrace:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-72-a59f4c54298a> in <module>()
     11 img = b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15\xc4\x89\x00\x00\x00\rIDATx\xdac\xfc\xcf\xf0\xbf\x1e\x00\x06\x83\x02\x7f\x94\xad\xd0\xeb\x00\x00\x00\x00IEND\xaeB`\x82'
     12 images = tf.constant([img, img])
---> 13 decode_all(images)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py in __call__(self, *args, **kwds)
    465               *args, **kwds)
    466       # If we did not create any variables the trace we have is good enough.
--> 467       return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
    468 
    469     def fn_with_cond(*inner_args, **inner_kwds):

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py in _filtered_call(self, args, kwargs)
   1139          if isinstance(t, (ops.Tensor,
   1140                            resource_variable_ops.BaseResourceVariable))),
-> 1141         self.captured_inputs)
   1142 
   1143   def _call_flat(self, args, captured_inputs, cancellation_manager=None):

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1222     if executing_eagerly:
   1223       flat_outputs = forward_function.call(
-> 1224           ctx, args, cancellation_manager=cancellation_manager)
   1225     else:
   1226       gradient_name = self._delayed_rewrite_functions.register()

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    509               inputs=args,
    510               attrs=("executor_type", executor_type, "config_proto", config),
--> 511               ctx=ctx)
    512         else:
    513           outputs = execute.execute_with_cancellation(

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     65     else:
     66       message = e.message
---> 67     six.raise_from(core._status_to_exception(e.code, message), None)
     68   except TypeError as e:
     69     keras_symbolic_tensors = [

/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
	 [[{{node map/TensorArrayUnstack/TensorListFromTensor/_12}}]]
  (1) Invalid argument:  2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
	 [[{{node map/TensorArrayUnstack/TensorListFromTensor/_12}}]]
	 [[Func/map/while/body/_1/input/_43/_24]]
0 successful operations.
0 derived errors ignored. [Op:__inference_decode_all_20554]

Function call stack:
decode_all -> decode_all

I ran this on Colab with a GPU Runtime, using TF 1.15.0rc3. It will probably bomb as well on TF 2.0.0 but I haven’t tried.

@nikitamaia Yes, this is fixed on TensorFlow 2.3.0. I tested this on the colab and also on my local machine (running TF 2.3.0 on Arch Linux).

I am also facing this issue on GPU (no error on CPU) when using to map_fn for string tensor (with float tensor all right):

def process_string(sample):
    # here i want to write something to file, but even with identity
    # function I get "non-DMA-copy attempted..." error
    return sample

@tf.function
def f(self, y_true, y_pred):
    string_tensor = y_true["path"]  # Tensor("data_batch_19:0", shape=(2,), dtype=string)
    tf.map_fn(process_string, string_tensor)

I do not know why, but for me manually placing map_fn on cpu AND making mock return from tf.function helped:

@tf.function
def f(self, y_true, y_pred):
    string_tensor = y_true["path"]  # Tensor("data_batch_19:0", shape=(2,), dtype=string)
    with tf.device("/cpu:0"):
        tf.map_fn(process_string, string_tensor)
    return y_pred  # do not work for me without return something

I am using tf2.0.1

This should be fixed for simple tf.map_fn example, however the underlying problem is still there, and might be triggered in more complex use cases. A fix commit has a repro with explanation.

I think the issue is incorrectly assigned as it is not directly related to tf.summary. @tensorflow/dev-support Can this be reassigned to someone working on functional ops such as tf.map_fn?

@rharish101 Thanks! Got it. This is not resolved. Thanks!