tensorflow: "ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): “18.04.1 LTS (Bionic Beaver)”
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: no
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): tf.VERSION = 1.12.0
- Python version: python3.6
- Bazel version (if compiling from source): no
- GCC/Compiler version (if compiling from source): no
- CUDA/cuDNN version: cuda9.0 with cuDNN 7.4.1
- GPU model and memory: GTX 1080 with 8 GB
You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with python -c “import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)”
Describe the current behavior
I am trying to pass the tfrecords read through tf.data.Dataset api into the model.fit .
Since the images could be of different sizes, I am storing the image shapes into tfrecords itself which are
later on read and applied to the img data using tf.reshape . But the tensorflow.keras is unable to determine the shape of this image data at this stage and throws the error.
def _parse_function(proto):
keys_to_features = {"im_path": tf.FixedLenSequenceFeature([], tf.string, allow_missing=True),
"im_shape": tf.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
"im_arr": tf.FixedLenSequenceFeature([], tf.string, allow_missing=True),
"label": tf.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
}
parsed_features = tf.parse_single_example(serialized=proto, features=keys_to_features)
parsed_features['im_arr'] = parsed_features['im_arr'][0]
parsed_features['label'] = parsed_features['label'][0]
parsed_features['im_arr'] = tf.decode_raw(parsed_features['im_arr'], tf.uint8)
parsed_features['im_arr'] = tf.reshape(parsed_features['im_arr'], parsed_features['im_shape'])
return parsed_features['im_arr'], parsed_features['label']
The error thrown is as follows :
Traceback (most recent call last):
File "issue/IssueScript.py", line 75, in <module>
verbose=1)
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1536, in fit
validation_split=validation_split)
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 992, in _standardize_user_data
class_weight, batch_size)
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1117, in _standardize_weights
exception_prefix='input')
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 284, in standardize_input_data
data = [standardize_single_array(x) for x in data]
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 284, in <listcomp>
data = [standardize_single_array(x) for x in data]
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 218, in standardize_single_array
if x.shape is not None and len(x.shape) == 1:
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 579, in __len__
raise ValueError("Cannot take the length of Shape with unknown rank.")
ValueError: Cannot take the length of Shape with unknown rank.
So as a debugging step, I removed the length check present in the standardize_single_array function by changing the check as (note the False and part which bypasses the length check)
if x is None:
return None
if False and (x.shape is not None and len(x.shape) == 1):
if tensor_util.is_tensor(x):
return array_ops.expand_dims(x, axis=1)
else:
return np.expand_dims(x, 1)
return x
Then I get the following error
Traceback (most recent call last):
File "issue/IssueScript.py", line 75, in <module>
verbose=1)
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1536, in fit
validation_split=validation_split)
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 992, in _standardize_user_data
class_weight, batch_size)
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1154, in _standardize_weights
exception_prefix='target')
File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 323, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking target: expected activation_4 to have 2 dimensions, but got array with shape (None,)
I did the same with the above error. I removed the check present at line 323 by commenting out the length check as follows.
"""
if len(data_shape) != len(shape):
raise ValueError('Error when checking ' + exception_prefix +
': expected ' + names[i] + ' to have ' +
str(len(shape)) + ' dimensions, but got array '
'with shape ' + str(data_shape))
"""
Now the training proceeds smoothly without error. I believe there is issue with tf.reshape when tensors are supplied as a shape to the function.
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Code : https://github.com/dineshdharme/tensorflow-issue1
Just run : python3 issue/IssueScript.py
I have also added a tfrecords generating script tfrecords_utils.py which you can call by
To generate tfrecords file using the image data present in the data folder :
python3 issue/tfrecords_utils.py
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 45 (11 by maintainers)
Links to this issue
Commits related to this issue
- Fix ValueError with tf.data.Dataset and model.fit This fix, like its predecessor PR #24522, tries to address issue no. #24520, where passing tf.data.Dataset into model.fit may result in `ValueError: ... — committed to jordanra-miso/tensorflow by jordanra-miso 5 years ago
- Fix ValueError with tf.data.Dataset and model.fit This fix, like its predecessor PR #24522, tries to address issue no. #24520, where passing tf.data.Dataset into model.fit may result in `ValueError: ... — committed to jordanra-miso/tensorflow by jordanra-miso 5 years ago
- Fix ValueError with tf.data.Dataset and model.fit This fix, like its predecessor PR #24522, tries to address issue no. #24520, where passing tf.data.Dataset into model.fit may result in `ValueError: ... — committed to jordanra-miso/tensorflow by jordanra-miso 5 years ago
- Fix ValueError with tf.data.Dataset and model.fit This fix, like its predecessor PR tensorflow#24522, tries to address issue no. tensorflow#24520, where passing tf.data.Dataset into model.fit may res... — committed to jordanra-miso/tensorflow by jordanra-miso 5 years ago
Also having this problem with tf1.14
In my case, I encountered the problem when using a
tf.data.Dataset, with which I had mapped a python function using a combination oftf.data.Dataset.map()andtf.py_func. I was able to sidestep the issue by specifying the shape of the tensor returned fromtf.py_func, as below:Obviously this will only work if you know the shapes of the tensor(s) ahead of time.
@milinddeore for what it’s worth, I ran into the same error when using
tf.Dataset.from_generator, and was able to fix the issue by passing in an argument foroutput_shapes. I didn’t even have to specify the exact dimensions - using something liketf.TensorShape([None, None, None])worked in the case of using 3 dimensional images. While thefrom_tensor_slicesmethod doesn’t have have the same parameter, I wonder if there is an analogous way to set the shape (even if filled withNone, but having the right dimensionality) that might resolve your issue until this PR is merged.So, I have an imagenet generator outputting (image, label) of shape ((224, 224, 3), (1, )).
Original code that leads to that same error being thrown:
Fixed code that prevents that error:
I encountered the same problem when using the from_tensor_slices in the tf.keras model.fit method. I use
dataset.as_numpy_iterator()as a workaround.Use like:
Instead of:
I have also encountered this issue when passing tf.data to tf.keras fit method. Unlike @sallamander though, I used
tf.data.Dataset.from_tensor_slicesinstead oftf.Dataset.from_generator.The shape can then be defined inside the mapping function.
Same situation here. Installing the nightly build of today (July 24th, 2019) with:
in colaboratory.
If I use a strategy, in this case to access a TPU, I get the error:
However, if I just avoid the use of an strategy, thinks work smoothly.
I am using a dataset.map which includes a numpy_function that calls a python object that is in charge of the training.
My first guess would be something related with serialization.
Ok. I managed to get a solution working using only the Protobuf api available in tensorflow. This approach avoid serializing numpy data to bytes and passing the raw bytes to the TFRecord. This approach forces the use of a special py_function that cannot be serialized and moved on the graph when sending data to the TPU or in a distributed environment.
The code is available at https://gist.github.com/vicpara/5c23c78d0f3105af53798272e628d2ad .
As the map function that does the tensor manipulation of the feature dictionary produced by the
TFRecordDatasetis not required anymore, the above approach also avoids the additionalset_shapeoperation outside the@py_functionscope in an additional map operation.Lastly, i feel i managed to get a decent solution that at least solves my problem to a satisfactory level. It’s easy to add more features of different kinds after getting to understand how is TF using protobuf to serialize these features which is not the easiest thing ever. Having run into all these issues i would like to mention the following:
Just by looking at this issue alone one can find that the support given here is quite disconsiderate. This API goes 90% to do something and then just drops the ball. The errors are obscure, the documentation arcane and difficult to decipher and the proper examples almost lacking. Within a year and a half there has been little guidance that came from the TF devs to help address all these issues raised here.
(Tensorflow Dataset)[https://github.com/tensorflow/datasets] goes a long way to fix so many issues around the current API when it comes to making the
serializationanddeserializationexperience smooth. They almost completely built their own mechanism barely using anything provided in this one. It does introduce an additional functionality around publishing a dataset to the cloud but they do fill in a lot of issues around the existing protobuf protocol.Tensors as they currently stand contain all the information needed to get fully serialized/deserialized in the backend. You have there the shape, the types and the values. For the typical standard use it should be straight forward.
Why do put your API users to such pain of digging in multiple source repos to fix your half baked solutions ?
The dev support for tensorflow related issues is incredibly slow and mostly not helpful. It pretty much says that we have an error because we have an error and don’t do that to not get the error.
The documentation lacks supporting examples, the examples are trivial and the error hide so much of the magic happening inside the graph.
What kind of community do you want around Tensorflow? Is Tensorflow just for the academics? You seem to cater mostly to the academics and presume all datasets are and should be floating freely in the cloud. Are the academics going to sell out your TPU time?
Your declared intentions in the promotional videos of TF relating to the industry adoption don’t correspond to the level of support you show here nor to the maturity, documentation and completeness of the APIs.
If you feel the industry people should move to PyTorch and forget about using TF with ease I’d appreciate to find out about this sooner rather than later.
@dineshdharme Added PR #24522 for the fix.
I was getting this error with the multi-input, generator code below,
ValueError: Cannot take the length of shape with unknown rank.
upgrading to v2.2 resolved the issue thanks
@rachellim I encountered exactly the same thing as @adriancaruana . His proposed solution works for me with
tf.__version__: 2.1.0. Be sure to applyset_shape()to the tensor returned bytf.py_function(), and not withintf.py_function().Same here, upgrade to TF2.2 solved the problem.
Hello All,
I am currently working with Tensorflow 2.1
I am trying to implement MirroredStrategy to an Image captions generator. I am getting the above-said error when I try to start training.
Code snippet `
Error
Traceback (most recent call last): File "MirrorTrainer_Image2Smilex.py", line 88, in <module> train_dist_dataset = strategy.experimental_distribute_dataset(dataset) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 677, in experimental_distribute_dataset return self._extended._experimental_distribute_dataset(dataset) # pylint: disable=protected-access File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 580, in _experimental_distribute_dataset split_batch_by=self._num_replicas_in_sync) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 89, in get_distributed_dataset input_context=input_context) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 509, in __init__ dataset = distribute._RebatchDataset(dataset, split_batch_by) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 110, in __init__ rebatch, dataset_ops.get_structure(input_dataset)) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/util/nest.py", line 245, in map_structure structure[0], [func(*x) for x in entries]) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/util/nest.py", line 245, in <listcomp> structure[0], [func(*x) for x in entries]) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 105, in rebatch batch_size = recalculate_batch_size(type_spec._to_legacy_output_shapes()) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 90, in recalculate_batch_size if len(output_shapes) < 1: File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 822, in __len__ raise ValueError("Cannot take the length of shape with unknown rank.") ValueError: Cannot take the length of shape with unknown rank.Any idea to come across this problem? Or even a different distributed training implementation for Image Captioning will be much helpful
@rachellim Oh, I thought that only tensor(non-scalar) should be applied
set_shape. After I applyset_shapeto a scalar, the error is gone! Thank you 👍I’m currently using tf.version: 2.1.0 docker image from TensorFlow docker hub.
I encountered this issue when I used tf.data.TFRecordDataset() as an argument of model.fit(). As @birkanatici mentioned, using dataset.as_numpy_iterator() helped me to resolve the error, but it’s just workaround.
Update: Based on talking to @robieta, keras expects its inputs to have at least known rank (even if dimensions are unknown). So, if the dataset has components with unknown rank, keras will not work.
In some cases, tf.data is not able to statically infer the rank of its outputs (e.g. if you use a py_func), so you have to manually use
set_shapeto tell the dataset what shapes its outputs are, as @adriancaruana suggested in https://github.com/tensorflow/tensorflow/issues/24520#issuecomment-532958834. Note that you don’t have to know the shape fully, you just need to know the number of dimensions. So, you could do something like:Does this resolve the issue for all who’ve encountered it?
On the keras side, we should surface a more informative error. @karmel , can you reassign this to someone on the keras team to surface a more informative error message when the input shapes are of unknown rank?
Same problem with dataset created with
tf.data.TFRecordDatasetHad the exact same error in tf 2.3.0 set_shape solved the error with model.fit
@dineshdharme Thanks for the issue!
Apologies for the delay,
tf.keras’s built-in training loops just went through a major rewrite in order to support custom training steps out-of-the-box. Many fixes were blocked on this rewrite.This is now fixed at head, here’s an example of passing
Tensors of unknown rank toModel.fit:For more info on the rewrite, please check out the 2.2 release notes
I have to agree with @dominthomas here compared to using a Sequence with multiprocessing on all 24 cores, this is way, way way slower (20 hours versus 4 using multiprocessing for an epoch for me). I get no parallelism using TF.dataset I assume b/c I am using a pyfunc, b/c I couldn’t figure out how to put all my work into TF. My Sequence functions involve using a pandas index file, and reading out from a binary numpy file based on the pandas index, maybe there is a way to do this in pure dataset notation, but right now the documentation is so opaque I can’t figure it out.
@dominthomas, two things:
(1) i would recommend using
tf.data.experimental.AUTOTUNEas the parallelism setting on yourmapfunction, e.g.dataset = dataset.map(map_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE)This dynamically picks the optimal parallelism setting at runtime for best performance – reducing the need to set the parallelism manually. Let me know how this works for you.
If you want to understand how to debug input pipeline performance, the TF team recently released the TF profiler with TF 2.2. It can help you understand which parts of the input pipeline are slow. We can chat about that in a separate thread - the team is also working on releasing a guide to debugging input pipeline performance using the TF profiler.
Tutorial: https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras Guide: https://www.tensorflow.org/guide/profiler
(2) The second code snippet you shared “works” (doesn’t raise an error or require
set_shape) because you’re passing the outputs of the iterator directly to keras. Theset_shapeworkaround is only necessary when you’re passing the dataset directly to keras; it has to do with how keras handles dataset shapes.