tensorflow: "ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): “18.04.1 LTS (Bionic Beaver)”
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: no
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): tf.VERSION = 1.12.0
Python version: python3.6
Bazel version (if compiling from source): no
GCC/Compiler version (if compiling from source): no
CUDA/cuDNN version: cuda9.0 with cuDNN 7.4.1
GPU model and memory: GTX 1080 with 8 GB

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with python -c “import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)”

Describe the current behavior I am trying to pass the tfrecords read through tf.data.Dataset api into the model.fit . Since the images could be of different sizes, I am storing the image shapes into tfrecords itself which are later on read and applied to the img data using tf.reshape . But the tensorflow.keras is unable to determine the shape of this image data at this stage and throws the error.

def _parse_function(proto):
    keys_to_features = {"im_path": tf.FixedLenSequenceFeature([], tf.string, allow_missing=True),
                        "im_shape": tf.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
                        "im_arr": tf.FixedLenSequenceFeature([], tf.string, allow_missing=True),
                        "label": tf.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
                        }

    parsed_features = tf.parse_single_example(serialized=proto, features=keys_to_features)
    parsed_features['im_arr'] = parsed_features['im_arr'][0]
    parsed_features['label'] = parsed_features['label'][0]
    parsed_features['im_arr'] = tf.decode_raw(parsed_features['im_arr'], tf.uint8)
    parsed_features['im_arr'] = tf.reshape(parsed_features['im_arr'], parsed_features['im_shape'])

    return parsed_features['im_arr'], parsed_features['label']

The error thrown is as follows :

Traceback (most recent call last):
  File "issue/IssueScript.py", line 75, in <module>
    verbose=1)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1536, in fit
    validation_split=validation_split)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 992, in _standardize_user_data
    class_weight, batch_size)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1117, in _standardize_weights
    exception_prefix='input')
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 284, in standardize_input_data
    data = [standardize_single_array(x) for x in data]
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 284, in <listcomp>
    data = [standardize_single_array(x) for x in data]
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 218, in standardize_single_array
    if x.shape is not None and len(x.shape) == 1:
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 579, in __len__
    raise ValueError("Cannot take the length of Shape with unknown rank.")
ValueError: Cannot take the length of Shape with unknown rank.

So as a debugging step, I removed the length check present in the standardize_single_array function by changing the check as (note the False and part which bypasses the length check)

  if x is None:
    return None
  if False and (x.shape is not None and len(x.shape) == 1):
    if tensor_util.is_tensor(x):
      return array_ops.expand_dims(x, axis=1)
    else:
      return np.expand_dims(x, 1)
  return x

Then I get the following error

Traceback (most recent call last):
  File "issue/IssueScript.py", line 75, in <module>
    verbose=1)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1536, in fit
    validation_split=validation_split)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 992, in _standardize_user_data
    class_weight, batch_size)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1154, in _standardize_weights
    exception_prefix='target')
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 323, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking target: expected activation_4 to have 2 dimensions, but got array with shape (None,)

I did the same with the above error. I removed the check present at line 323 by commenting out the length check as follows.

        """
        if len(data_shape) != len(shape):
          raise ValueError('Error when checking ' + exception_prefix +
                           ': expected ' + names[i] + ' to have ' +
                           str(len(shape)) + ' dimensions, but got array '
                           'with shape ' + str(data_shape))
        """

Now the training proceeds smoothly without error. I believe there is issue with tf.reshape when tensors are supplied as a shape to the function.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. Code : https://github.com/dineshdharme/tensorflow-issue1 Just run : python3 issue/IssueScript.py

I have also added a tfrecords generating script tfrecords_utils.py which you can call by To generate tfrecords file using the image data present in the data folder : python3 issue/tfrecords_utils.py

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 45 (11 by maintainers)

Links to this issue

Commits related to this issue

Fix ValueError with tf.data.Dataset and model.fit This fix, like its predecessor PR #24522, tries to address issue no. #24520, where passing tf.data.Dataset into model.fit may result in `ValueError: ... — committed to jordanra-miso/tensorflow by jordanra-miso 5 years ago
Fix ValueError with tf.data.Dataset and model.fit This fix, like its predecessor PR #24522, tries to address issue no. #24520, where passing tf.data.Dataset into model.fit may result in `ValueError: ... — committed to jordanra-miso/tensorflow by jordanra-miso 5 years ago
Fix ValueError with tf.data.Dataset and model.fit This fix, like its predecessor PR #24522, tries to address issue no. #24520, where passing tf.data.Dataset into model.fit may result in `ValueError: ... — committed to jordanra-miso/tensorflow by jordanra-miso 5 years ago
Fix ValueError with tf.data.Dataset and model.fit This fix, like its predecessor PR tensorflow#24522, tries to address issue no. tensorflow#24520, where passing tf.data.Dataset into model.fit may res... — committed to jordanra-miso/tensorflow by jordanra-miso 5 years ago

Most upvoted comments

Also having this problem with tf1.14

In my case, I encountered the problem when using a tf.data.Dataset, with which I had mapped a python function using a combination of tf.data.Dataset.map() and tf.py_func. I was able to sidestep the issue by specifying the shape of the tensor returned from tf.py_func, as below:

        def mappable_fn(x):
            result_tensors = tf.py_func(func=my_py_func,
                                        inp=[my_py_func_args],
                                        Tout=[my_py_func_output_types])
            result_tensor.set_shape(the_shape_i_know_ahead_of_time)
            return (result_tensor)

Obviously this will only work if you know the shapes of the tensor(s) ahead of time.

+29

adriancaruana on Sep 19, 2019

@milinddeore for what it’s worth, I ran into the same error when using tf.Dataset.from_generator, and was able to fix the issue by passing in an argument for output_shapes. I didn’t even have to specify the exact dimensions - using something like tf.TensorShape([None, None, None]) worked in the case of using 3 dimensional images. While the from_tensor_slices method doesn’t have have the same parameter, I wonder if there is an analogous way to set the shape (even if filled with None, but having the right dimensionality) that might resolve your issue until this PR is merged.

So, I have an imagenet generator outputting (image, label) of shape ((224, 224, 3), (1, )).

Original code that leads to that same error being thrown:

dataset = tf.Dataset.from_generator(
    generator, output_types=('float32', 'uint8')
)

Fixed code that prevents that error:

dataset = tf.Dataset.from_generator(
    generator, output_types=('float32', 'uint8'),
    output_shapes=(tf.TensorShape((None, None, None)), tf.TensorShape((1, )))
)

+23

sallamander on May 19, 2019

I encountered the same problem when using the from_tensor_slices in the tf.keras model.fit method. I use dataset.as_numpy_iterator() as a workaround.

Use like:

dataset = tf.data.Dataset.from_tensor_slices(....).map(...)
model.fit(dataset.as_numpy_iterator())

Instead of:

dataset = tf.data.Dataset.from_tensor_slices(....).map(...)
model.fit(dataset)

birkanatici on Jan 18, 2020

I have also encountered this issue when passing tf.data to tf.keras fit method. Unlike @sallamander though, I used tf.data.Dataset.from_tensor_slices instead of tf.Dataset.from_generator.

The shape can then be defined inside the mapping function.

def preprocessing(img_path):
    train_img = load_img(img_path) # Unrelated implementation thus not shown
    train_img = tf.reshape(train_img, shape=(input_width, input_height, input_channel))
    return train_img

train_data = tf.data.Dataset.from_tensor_slices(img_paths)
train_data = train_data.shuffle(len(img_paths)).map(preprocessing)

harewei on Jul 16, 2019

Same situation here. Installing the nightly build of today (July 24th, 2019) with:

pip install tf-nightly-2.0-preview

in colaboratory.

If I use a strategy, in this case to access a TPU, I get the error:

ValueError: Cannot take the length of shape with unknown rank.

However, if I just avoid the use of an strategy, thinks work smoothly.

I am using a dataset.map which includes a numpy_function that calls a python object that is in charge of the training.

class DummyTrainer:
    def __init__(self, file_path, samples):
        self.file_path = file_path
        self.samples = samples
        self.index_array = np.random.permutation(self.samples)

    def _calc_image(self, index):
        filename= os.path.join(self.file_path, str(i)+'.png')
        if os.path.isfile(filename):
            img_raw = tf.io.read_file(filename)
            img = tf.io.decode_png(img_raw, channels=1)
            img = tf.dtypes.cast(img, tf.float32) / 256.0
            return img
        else:
            return None

    def _dataset_function(self, index):
        return tuple(tf.numpy_function(
            self._calc_image, [index],
            [tf.float32, tf.uint8]))

    def get_trainer_dataset(self):
        # Reading the images from the numpy array of indexes
        dataset = Dataset.from_tensor_slices(
            self.index_array[:self.trainer_samples])
        return dataset.map(self._dataset_function).shuffle(1)

trainer = DummyTrainer('data', 256)

model.fit(trainer.get_trainer_dataset(), ...)

My first guess would be something related with serialization.

jmgc on Jul 24, 2019

Ok. I managed to get a solution working using only the Protobuf api available in tensorflow. This approach avoid serializing numpy data to bytes and passing the raw bytes to the TFRecord. This approach forces the use of a special py_function that cannot be serialized and moved on the graph when sending data to the TPU or in a distributed environment.

The code is available at https://gist.github.com/vicpara/5c23c78d0f3105af53798272e628d2ad .

As the map function that does the tensor manipulation of the feature dictionary produced by the TFRecordDataset is not required anymore, the above approach also avoids the additional set_shape operation outside the @py_function scope in an additional map operation.

Lastly, i feel i managed to get a decent solution that at least solves my problem to a satisfactory level. It’s easy to add more features of different kinds after getting to understand how is TF using protobuf to serialize these features which is not the easiest thing ever. Having run into all these issues i would like to mention the following:

Just by looking at this issue alone one can find that the support given here is quite disconsiderate. This API goes 90% to do something and then just drops the ball. The errors are obscure, the documentation arcane and difficult to decipher and the proper examples almost lacking. Within a year and a half there has been little guidance that came from the TF devs to help address all these issues raised here.
(Tensorflow Dataset)[https://github.com/tensorflow/datasets] goes a long way to fix so many issues around the current API when it comes to making the serialization and deserialization experience smooth. They almost completely built their own mechanism barely using anything provided in this one. It does introduce an additional functionality around publishing a dataset to the cloud but they do fill in a lot of issues around the existing protobuf protocol.
Tensors as they currently stand contain all the information needed to get fully serialized/deserialized in the backend. You have there the shape, the types and the values. For the typical standard use it should be straight forward.
Why do put your API users to such pain of digging in multiple source repos to fix your half baked solutions ?
The dev support for tensorflow related issues is incredibly slow and mostly not helpful. It pretty much says that we have an error because we have an error and don’t do that to not get the error.
The documentation lacks supporting examples, the examples are trivial and the error hide so much of the magic happening inside the graph.

What kind of community do you want around Tensorflow? Is Tensorflow just for the academics? You seem to cater mostly to the academics and presume all datasets are and should be floating freely in the cloud. Are the academics going to sell out your TPU time?

Your declared intentions in the promotional videos of TF relating to the industry adoption don’t correspond to the level of support you show here nor to the maturity, documentation and completeness of the APIs.

If you feel the industry people should move to PyTorch and forget about using TF with ease I’d appreciate to find out about this sooner rather than later.

vicpara on May 11, 2020

@dineshdharme Added PR #24522 for the fix.

yongtang on Dec 22, 2018

I was getting this error with the multi-input, generator code below,

ValueError: Cannot take the length of shape with unknown rank.

upgrading to v2.2 resolved the issue thanks

def gen_obs():
    for i in itertools.count(1):
        action = env.action_space.sample()
        #print(action)
        obs, rew,dn,inf = env.step(action)
        print(obs, rew,dn,inf)
        #yield ([obs[0],obs[1],obs[2],obs[3], rew, dn], [10.0])
        yield {"reference_pa":[obs[0]], "perception_pa":[obs[1]]}, [10.0]
      

ds_counter = tf.data.Dataset.from_generator(gen_obs, output_types=({"reference_pa":tf.float32, "perception_pa":tf.float32}, tf.float32))

history = model.fit(ds_counter, epochs=20, verbose=True)

ruperty on Jun 23, 2020

@rachellim I encountered exactly the same thing as @adriancaruana . His proposed solution works for me with tf.__version__: 2.1.0. Be sure to apply set_shape() to the tensor returned by tf.py_function(), and not within tf.py_function().

edufonseca on Jan 28, 2020

Same here, upgrade to TF2.2 solved the problem.

alar0330 on Jul 13, 2020

Hello All,

I am currently working with Tensorflow 2.1

I am trying to implement MirroredStrategy to an Image captions generator. I am getting the above-said error when I try to start training.

Code snippet `

   with strategy.scope():

	def map_func(img_name, cap):
	img_tensor = np.load(img_name.decode('utf-8')+'.npy')
	img_tensor.set_shape(299,299)
	return img_tensor, cap

   dataset = tf.data.Dataset.from_tensor_slices((img_name_train, cap_train))

   dataset = dataset.map(lambda item1, item2: tf.numpy_function(map_func, [item1, item2], [tf.float32, tf.int32]))

   dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
   dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

   train_dist_dataset = strategy.experimental_distribute_dataset(dataset)`

Error Traceback (most recent call last): File "MirrorTrainer_Image2Smilex.py", line 88, in <module> train_dist_dataset = strategy.experimental_distribute_dataset(dataset) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 677, in experimental_distribute_dataset return self._extended._experimental_distribute_dataset(dataset) # pylint: disable=protected-access File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 580, in _experimental_distribute_dataset split_batch_by=self._num_replicas_in_sync) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 89, in get_distributed_dataset input_context=input_context) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 509, in __init__ dataset = distribute._RebatchDataset(dataset, split_batch_by) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 110, in __init__ rebatch, dataset_ops.get_structure(input_dataset)) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/util/nest.py", line 245, in map_structure structure[0], [func(*x) for x in entries]) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/util/nest.py", line 245, in <listcomp> structure[0], [func(*x) for x in entries]) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 105, in rebatch batch_size = recalculate_batch_size(type_spec._to_legacy_output_shapes()) File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 90, in recalculate_batch_size if len(output_shapes) < 1: File "/home/kohulan/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 822, in __len__ raise ValueError("Cannot take the length of shape with unknown rank.") ValueError: Cannot take the length of shape with unknown rank.

Any idea to come across this problem? Or even a different distributed training implementation for Image Captioning will be much helpful

caffeine-lab on Feb 20, 2020

@rachellim Oh, I thought that only tensor(non-scalar) should be applied set_shape. After I apply set_shape to a scalar, the error is gone! Thank you 👍

seongmoon729 on Feb 18, 2020

I’m currently using tf.version: 2.1.0 docker image from TensorFlow docker hub.

I encountered this issue when I used tf.data.TFRecordDataset() as an argument of model.fit(). As @birkanatici mentioned, using dataset.as_numpy_iterator() helped me to resolve the error, but it’s just workaround.

seongmoon729 on Feb 18, 2020

Update: Based on talking to @robieta, keras expects its inputs to have at least known rank (even if dimensions are unknown). So, if the dataset has components with unknown rank, keras will not work.

In some cases, tf.data is not able to statically infer the rank of its outputs (e.g. if you use a py_func), so you have to manually use set_shape to tell the dataset what shapes its outputs are, as @adriancaruana suggested in https://github.com/tensorflow/tensorflow/issues/24520#issuecomment-532958834. Note that you don’t have to know the shape fully, you just need to know the number of dimensions. So, you could do something like:

def map_fn(x):
  result_tensor = ...
  result_tensor.set_shape([None for _ in range(rank)])
  return result_tensor

Does this resolve the issue for all who’ve encountered it?

On the keras side, we should surface a more informative error. @karmel , can you reassign this to someone on the keras team to surface a more informative error message when the input shapes are of unknown rank?

rachellim on Jan 22, 2020

Same problem with dataset created with tf.data.TFRecordDataset

mikulatomas on Jan 11, 2020

Had the exact same error in tf 2.3.0 set_shape solved the error with model.fit

StephaneKazmierczak on Mar 5, 2021

@dineshdharme Thanks for the issue!

Apologies for the delay, tf.keras’s built-in training loops just went through a major rewrite in order to support custom training steps out-of-the-box. Many fixes were blocked on this rewrite.

This is now fixed at head, here’s an example of passing Tensors of unknown rank to Model.fit:

import tensorflow as tf
import numpy as np

def my_numpy_fn(x, y):
  return -x, -y

features = np.arange(10).astype(np.float32)
labels = 2 * features
ds = tf.data.Dataset.from_tensor_slices((features, labels))
# Do a transformation that loses rank information.
ds = ds.map(
    lambda x, y: tf.numpy_function(
        my_numpy_fn, inp=[x, y], Tout=[tf.float32, tf.float32]
        )
    ).batch(2)

assert iter(ds).output_shapes[0] == tf.TensorShape(None)

# Model works with Tensors of unknown rank.
# Note that if your Model uses layers like `Dense`, etc. that
# only work with ceratin ranks, you should still use `x.set_shape`
# before passing the data to that layer, to give the Model a hint
# about the rank.
class MyModel(tf.keras.Model):
  def call(self, x):
    return 2 * x

model.compile('sgd', 'mse')
model.fit(ds)

For more info on the rewrite, please check out the 2.2 release notes

omalleyt12 on May 11, 2020

I have to agree with @dominthomas here compared to using a Sequence with multiprocessing on all 24 cores, this is way, way way slower (20 hours versus 4 using multiprocessing for an epoch for me). I get no parallelism using TF.dataset I assume b/c I am using a pyfunc, b/c I couldn’t figure out how to put all my work into TF. My Sequence functions involve using a pandas index file, and reading out from a binary numpy file based on the pandas index, maybe there is a way to do this in pure dataset notation, but right now the documentation is so opaque I can’t figure it out.

jostheim on Apr 17, 2020

@dominthomas, two things:

(1) i would recommend using tf.data.experimental.AUTOTUNE as the parallelism setting on your map function, e.g.

dataset = dataset.map(map_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE)

This dynamically picks the optimal parallelism setting at runtime for best performance – reducing the need to set the parallelism manually. Let me know how this works for you.

If you want to understand how to debug input pipeline performance, the TF team recently released the TF profiler with TF 2.2. It can help you understand which parts of the input pipeline are slow. We can chat about that in a separate thread - the team is also working on releasing a guide to debugging input pipeline performance using the TF profiler.

Tutorial: https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras Guide: https://www.tensorflow.org/guide/profiler

(2) The second code snippet you shared “works” (doesn’t raise an error or require set_shape) because you’re passing the outputs of the iterator directly to keras. The set_shape workaround is only necessary when you’re passing the dataset directly to keras; it has to do with how keras handles dataset shapes.

rachellim on Apr 7, 2020