tensorflow: Dataset from generator is far slower than from tensor slices, anything I can improve?

Click to expand!

Issue Type

Performance

Source

binary

Tensorflow Version

tf 1.15.0

Custom Code

Yes

OS Platform and Distribution

Linux Ubuntu 18.04

Mobile device

No response

Python version

3.7

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

When I use `tf.data.Dataset.from_tensor_slices` and `tf.data.Dataset.from_generator` to create the same dataset, I observe when fetching batches, from_generator is much slower than from_tensor_slices (more than 10x slower).
Also the performance of from_generator greatly drops when the number of yield elements increases. (See the details below).

I am using tf1.15, but I tested tf2.8, seems the performance of generator is even worse...

Wondering is it reasonable that Dataset from generator is far slower than from tensor slices? And want some help on how to improve the performance using the generator. If I want to have a dataset with 10 elements for each records, how can I get better performance?

Thanks so much in advance!

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np
import time

tf.compat.v1.enable_eager_execution(
    config=None, device_policy=None, execution_mode=None
)

size = 100000
data = np.random.rand(size)

def get_one():
    i = 0
    while i < size:
        yield tuple([data[i]]*10)
        i += 1

# dataset = tf.data.Dataset.from_generator(get_one, output_types=tuple([tf.float32]*10))
dataset = tf.data.Dataset.from_tensor_slices(tuple([data]*10))
dataset = dataset.batch(512)

i = 0
total_time = 0
start = time.time()
for sample in dataset:
    # Performing a training step
    end = time.time()
    used = end - start
    total_time += used
    print("Get batch time: ", used)
    i += 1
    start = time.time()
print("Average get batch time: ", total_time / i)

Relevant log output

When using from_tensor_slices, average get batch time: 0.001715s When using from_generator, average get batch time: 0.10747s [0.14395s for tf2.8]

If I change the generator function as follows and re-run the above program:

def get_one():
    i = 0
    while i < size:
        for j in range(10):
            res = data[i]
        yield res
        i += 1

Namely if I get the value 10 times in the generator, but only yield one element, the performance is much faster compared with yield 10 features: average get batch time: 0.02014s, even though it is still much slower than from_tensor_slices. Why is that? I suppose the computation is the same, the only difference is how many elements I output? (BTW if I use tf2.8, in this case, it is 0.3935s, unreasonable… don’t know why…)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 22 (10 by maintainers)

Most upvoted comments

Dataset.from_generator is generally not as performant as other ways of using tf.data, since it executes the input generation logic in Python instead of C++. In particular, the generator will always run single-threaded, while computation expressed with other tf.data transformations can be run with many threads in parallel. If performance is critical, prefer other APIs like Dataset.from_tensor_slices

@aaudiber @tensorflowbutler @sachinprasadhs The problem withDataset.from_tensor_slices is that it is unsuitable for big datasets, while Dataset.from_generator will work. There are many technical problems with from_generator, and the main one, in my opinion, is that loading time is extremely slow compared to Dataset.from_tensor_slices. in my case, using generators slows the training in x3. This is a big problem!

Therefore, I think this issue should be open and treated as for big datasets, Dataset.from_generator should provide an easy solution.