tensorflow: Cannot add tensor to the batch: number of elements does not match (while Iterating through dataset elements).

System info: Os: MacOs catalina (10.15.5 ) Tensorflow: 2.0.0 installed over anaconda navigator (1.9.12 python 3.7) enviroment

Code:

# Load dataset from TFRecord file:
dataset = tf.data.TFRecordDataset(filenames=data_dir)
parsed_dataset = dataset.map(parsing_fn).batch(32)
print(parsed_dataset)
for image,label in parsed_dataset.take(2):
    print(image, label)

Output:

<BatchDataset shapes: ((None, None) (None,)), types: (tf.float32, tf.int64)>

InvalidArgumentError: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [810000], [batch]: [243712] [Op:IteratorGetNextSync]

Helper functions below

( TFRecord writed using tensorflow 1.14.0 over google colaboratory)

Parsing function:

def parsing_fn(serialized):

    features = \
        {
            'image': tf.FixedLenFeature([], tf.string),
            'label': tf.FixedLenFeature([], tf.int64)
        }

    # Parse the serialized data so we get a dict with our data.
    parsed_example = tf.parse_single_example(serialized=serialized,
                                             features=features)

    # Get the image as raw bytes.
    image_raw = parsed_example['image']

    # Decode the raw bytes so it becomes a tensor with type.
    image = tf.decode_raw(image_raw, tf.uint8)
    
    # The type is now uint8 but we need it to be float.
    image = tf.cast(image, tf.float32)

    label = parsed_example['label']

    return image, label

Code used to create the TFRecord files


    with tf.python_io.TFRecordWriter(out_path) as writer:

          data = \
              {
                  'image': wrap_bytes(img_bytes),
                  'label': wrap_int64(label)
              }

          # Wrap the data as TensorFlow Features.
          feature = tf.train.Features(feature=data)

          # Wrap again as a TensorFlow Example.
          example = tf.train.Example(features=feature)

          # Serialize the data.
          serialized = example.SerializeToString()
            
          # Write the serialized data to the TFRecords file.
          writer.write(serialized)
def wrap_int64(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def wrap_bytes(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (2 by maintainers)

Most upvoted comments

I had this same issue running the Keras OCR example on my own data. Turns out I wasn’t accounting for the variable length labels in my data set. When I pad my labels to max_length it all works out.

images = sorted(list(map(str, list(data_dir.glob("*.png")))))
raw_labels = [img.split(os.path.sep)[-1].split(".png")[0] for img in images]
max_length = max([len(label) for label in raw_labels])
labels = [label.ljust(max_length) for label in raw_labels]
characters = set(char for label in labels for char in label)

@FalsoMoralista I put my code in this Colab notebook.

The error comes from the .batch(batch_size) part: train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) train_dataset = (train_dataset.map(encode_single_sample, num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(batch_size).prefetch(buffer_size=tf.data.experimental.AUTOTUNE))

I iterate through the dataset in the plotting cell right after that: for batch in train_dataset.take(1):

Interestingly enough, when I changed my batch_size to 1, it works (but well that batch_size comes with some another problems). In your case, it’s the number ‘32’ within parsed_dataset = dataset.map(parsing_fn).batch(32)

Worth to mention that I’m following Keras’ OCR example here and I ran their code just fine.

I had this same issue running the Keras OCR example on my own data. Turns out I wasn’t accounting for the variable length labels in my data set. When I pad my labels to max_length it all works out.

images = sorted(list(map(str, list(data_dir.glob("*.png")))))
raw_labels = [img.split(os.path.sep)[-1].split(".png")[0] for img in images]
max_length = max([len(label) for label in raw_labels])
labels = [label.ljust(max_length) for label in raw_labels]
characters = set(char for label in labels for char in label)

Thank you! This was the issue. I’m also following the OCR tutorial and ran into this issue.

I had this same issue running the Keras OCR example on my own data. Turns out I wasn’t accounting for the variable length labels in my data set. When I pad my labels to max_length it all works out.

images = sorted(list(map(str, list(data_dir.glob("*.png")))))
raw_labels = [img.split(os.path.sep)[-1].split(".png")[0] for img in images]
max_length = max([len(label) for label in raw_labels])
labels = [label.ljust(max_length) for label in raw_labels]
characters = set(char for label in labels for char in label)

Thank you! This was the issue. I’m also following the OCR tutorial and ran into this issue.

It worked for me. Thanks

@FalsoMoralista I put my code in this Colab notebook.

The error comes from the .batch(batch_size) part: train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) train_dataset = (train_dataset.map(encode_single_sample, num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(batch_size).prefetch(buffer_size=tf.data.experimental.AUTOTUNE))

I iterate through the dataset in the plotting cell right after that: for batch in train_dataset.take(1):

Interestingly enough, when I changed my batch_size to 1, it works (but well that batch_size comes with some another problems). In your case, it’s the number ‘32’ within parsed_dataset = dataset.map(parsing_fn).batch(32)

Worth to mention that I’m following Keras’ OCR example here and I ran their code just fine.

Worked for me, Thanks a lot 👍