tensorflow: Iterator on cached tf.data Dataset cannot be reinitialized
Found a likely bug when trying to use a reinitializable iterator to read from two cached datasets, one for validation and one for training. The iterator can however only be initialized once per cached dataset. Seems to me like the iterator should remove the lock file when being reinitialized, it is not in my case and that is why I get this issue. Here’s a minimal example with only one cached dataset.
(basic system information below)
Example
import os
import numpy as np
import tensorflow as tf
data = np.random.rand(10, 3).astype(np.float32)
dataset = tf.data.Dataset.from_tensor_slices(data)
batches = dataset.shuffle(10).repeat().batch(5)
config = tf.ConfigProto(device_count = {'GPU': 0})
sess = tf.Session(config=config)
cache_dir = os.path.join(os.getcwd(), 'cache_dir')
try:
os.makedirs(cache_dir)
except OSError:
print('Cache directory already exists')
cached = batches.cache(os.path.join(cache_dir, 'cache'))
iterator = tf.data.Iterator.from_structure(output_types=tf.float32, output_shapes=(5, 3))
batch = iterator.get_next()
init1 = iterator.make_initializer(cached)
init2 = iterator.make_initializer(batches)
sess.run(init1)
sess.run(batch)
array([[ 0.11960778, 0.3081578 , 0.96522039], [ 0.90339011, 0.12458269, 0.30650312], [ 0.58160347, 0.55877644, 0.50363588], [ 0.2350398 , 0.33509603, 0.4165386 ], [ 0.76757395, 0.50134581, 0.93601096]], dtype=float32)
sess.run(init2)
sess.run(batch)
array([[ 0.76757395, 0.50134581, 0.93601096], [ 0.2350398 , 0.33509603, 0.4165386 ], [ 0.90339011, 0.12458269, 0.30650312], [ 0.13266359, 0.82675195, 0.26691398], [ 0.58160347, 0.55877644, 0.50363588]], dtype=float32)
sess.run(init1)
sess.run(batch)
AlreadyExistsError (see above for traceback): There appears to be a concurrent caching iterator running - cache lockfile already exists (‘/home/ubuntu/ai_notebooks/notebooks/projects/deep-purple/cache_dir/cache.lockfile’). If you are sure no other running TF computations are using this cache prefix, delete the lockfile and re-initialize the iterator. Lockfile contents: Created at: 1513187725 [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[5,3]], output_types=[DT_FLOAT], _device=“/job:localhost/replica:0/task:0/device:CPU:0”]]
Sytem information
Tensorflow version: v1.4.0-rc1-11-g130a514 1.4.0 (installed from pip) Python version: 3.5.2 OS: Linux Ubuntu 16.04.3 CUDA: 8.0.61 cuDNN: 6
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 15 (12 by maintainers)
Commits related to this issue
- Avoid reinitializing the same iterator before end of epoch. https://github.com/tensorflow/tensorflow/issues/15343 — committed to mozilla/DeepSpeech by reuben 5 years ago
@saeta Thanks for helping out, that seemed to solve that particular problem.
I was applying the
repeat()
before caching (both in the example and in my real data loader), so I assume the cache releases the lock and produces the index file ontf.errors.OutOfRangeError
. When I move the repeat to after the caching and make sure to run through all data this example works.Perhaps this behaviour could be documented. Some interactions (e.g.
repeat
-cache
,shuffle
-cache
etc.) are pretty non-obvious to me and it would save me a huge amount of time to get at least some of those details listed.Thanks again, Agrin
@agrinh Funny you should ask. I and @jsimsa have been working on a datasets performance guide we hope to publish soon, and we cover recommended order-of-operations. I’ll make sure
repeat
vscache
andshuffle
vscache
are included. 😃Nice sleuthing and getting it all working!!