tensorflow: Dataset from generator is far slower than from tensor slices, anything I can improve?
Click to expand!
Issue Type
Performance
Source
binary
Tensorflow Version
tf 1.15.0
Custom Code
Yes
OS Platform and Distribution
Linux Ubuntu 18.04
Mobile device
No response
Python version
3.7
Bazel version
No response
GCC/Compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current Behaviour?
When I use `tf.data.Dataset.from_tensor_slices` and `tf.data.Dataset.from_generator` to create the same dataset, I observe when fetching batches, from_generator is much slower than from_tensor_slices (more than 10x slower).
Also the performance of from_generator greatly drops when the number of yield elements increases. (See the details below).
I am using tf1.15, but I tested tf2.8, seems the performance of generator is even worse...
Wondering is it reasonable that Dataset from generator is far slower than from tensor slices? And want some help on how to improve the performance using the generator. If I want to have a dataset with 10 elements for each records, how can I get better performance?
Thanks so much in advance!
Standalone code to reproduce the issue
import tensorflow as tf
import numpy as np
import time
tf.compat.v1.enable_eager_execution(
config=None, device_policy=None, execution_mode=None
)
size = 100000
data = np.random.rand(size)
def get_one():
i = 0
while i < size:
yield tuple([data[i]]*10)
i += 1
# dataset = tf.data.Dataset.from_generator(get_one, output_types=tuple([tf.float32]*10))
dataset = tf.data.Dataset.from_tensor_slices(tuple([data]*10))
dataset = dataset.batch(512)
i = 0
total_time = 0
start = time.time()
for sample in dataset:
# Performing a training step
end = time.time()
used = end - start
total_time += used
print("Get batch time: ", used)
i += 1
start = time.time()
print("Average get batch time: ", total_time / i)
Relevant log output
When using from_tensor_slices, average get batch time: 0.001715s When using from_generator, average get batch time: 0.10747s [0.14395s for tf2.8]
If I change the generator function as follows and re-run the above program:
def get_one():
i = 0
while i < size:
for j in range(10):
res = data[i]
yield res
i += 1
Namely if I get the value 10 times in the generator, but only yield one element, the performance is much faster compared with yield 10 features: average get batch time: 0.02014s, even though it is still much slower than from_tensor_slices. Why is that? I suppose the computation is the same, the only difference is how many elements I output? (BTW if I use tf2.8, in this case, it is 0.3935s, unreasonable… don’t know why…)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 22 (10 by maintainers)
@aaudiber @tensorflowbutler @sachinprasadhs The problem with
Dataset.from_tensor_slicesis that it is unsuitable for big datasets, whileDataset.from_generatorwill work. There are many technical problems withfrom_generator, and the main one, in my opinion, is that loading time is extremely slow compared toDataset.from_tensor_slices. in my case, using generators slows the training in x3. This is a big problem!Therefore, I think this issue should be open and treated as for big datasets,
Dataset.from_generatorshould provide an easy solution.