tensorflow: ParseTensor (tf.io.parse_tensor) is not vectorized - Vectorizing via tf.vectorized_map uses while_loop
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution: Ubuntu 16.04
- TensorFlow installed from: Binary
- TensorFlow version: v2.3.0-rc2-23-gb36436b087 2.3.0
- Python version: 3.8
- CUDA/cuDNN version: 10.1
- GPU model and memory: TITAN V, 12G VRAM
Describe the current behavior See the code below. I’m serializing a list of tensors and then attempt to parse them using (1) naive, single-record parsing and (2) batch-parsing using vectorized_map, which I expect to yield a significant performance increase.
But: tf.io.parse_tensor appear to be not implemented for vectorized parsing, as I’m getting a WARNING:tensorflow:Using a while_loop for converting ParseTensor message and see little to no performance increase!
I find it very surprising that such an essential operation is not vectorized… how else would I parse no-scalar features from e.g. a TFRecord file? Meanwhile, tf.io.parse_example is vectorized.
Describe the expected behavior
I would expect that using a vectorized version of tf.io.parse_tensor to yield significant performance increase.
Standalone code to reproduce the issue
import numpy as np
import tensorflow as tf
import time
# This would normaly come from some data stream, e.g. stream of TFRecords
some_tensor_list = [np.zeros(shape=(5,5), dtype=np.int32)]*100000
some_tensor_list_serialized = [tf.io.serialize_tensor(x) for x in some_tensor_list]
# Feed to tf.data
dataset = tf.data.Dataset.from_tensor_slices(some_tensor_list_serialized)
# Parsing whole batch back to tensors
def parse_batch(b):
return tf.vectorized_map(lambda x: tf.io.parse_tensor(x, out_type = tf.int32), b)
# Parsing single record back to tensors
def parse_single(x):
return tf.io.parse_tensor(x, out_type = tf.int32)
# Compare speed
def exaust_iterable(it):
t = time.time()
for _ in it:
pass
print(f'{time.time() - t}s')
# naive
dataset_naive = dataset.map(parse_single)
exaust_iterable(dataset_naive)
# vectorized over batch
dataset_vec = dataset.batch(32)
dataset_vec = dataset_vec.map(parse_batch)
exaust_iterable(dataset_vec)
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 3
- Comments: 16 (3 by maintainers)
I am aware that I could loop over the batch and parse each tensor individually. I was hoping that I wouldn’t need to do that and could simply call
tf.io.parse_tensoron a batch of tensors. It seems weird to me that you can parse a batch of examples withtf.io.parse_examplebut when it comes to parsing these tensors you need to loop over each tensor because it can’t handle a batch.Is there a good reason why
tf.io.parse_exampleis vectorized but nottf.io.parse_tensor?I agree with @sjang92 - this is still very much needed, specially for shallow models where IO is more of a bottleneck.
To summarize the request, please include a vectorized version of tf.io.parse_tensor
Just came across the same issue, I’m trying to do the following
batchtf.train.ExamplesI’m unable to do
4.sincetf.io.parse_tensoronly accepts a single tensor and not a batch of tensors.Here’s my code
Maybe I’m missing something but what’s the point of being able to parse a batch of
Examples if you can’t deserialize that batch of tensors? My options are to either loop through the batch and parse each tensor individually but that’s less efficient or to batch my data prior to building my tfrecords but that’s less flexible if I want to quickly change the batch size.Would be great to have a version of
tf.io.parse_tensorthat accepts a batch.+1 on the need for this
I am also interested in a vectorized version of tf.io.parse_tensor
@mhorlacher, Using
tf.functionhas resulted in significant performance increase. Please find the Gist. Thanks!