tensorflow: Cannot add tensor to the batch: number of elements does not match (while Iterating through dataset elements).
System info: Os: MacOs catalina (10.15.5 ) Tensorflow: 2.0.0 installed over anaconda navigator (1.9.12 python 3.7) enviroment
Code:
# Load dataset from TFRecord file:
dataset = tf.data.TFRecordDataset(filenames=data_dir)
parsed_dataset = dataset.map(parsing_fn).batch(32)
print(parsed_dataset)
for image,label in parsed_dataset.take(2):
print(image, label)
Output:
<BatchDataset shapes: ((None, None) (None,)), types: (tf.float32, tf.int64)>
InvalidArgumentError: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [810000], [batch]: [243712] [Op:IteratorGetNextSync]
Helper functions below
( TFRecord writed using tensorflow 1.14.0 over google colaboratory)
Parsing function:
def parsing_fn(serialized):
features = \
{
'image': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.int64)
}
# Parse the serialized data so we get a dict with our data.
parsed_example = tf.parse_single_example(serialized=serialized,
features=features)
# Get the image as raw bytes.
image_raw = parsed_example['image']
# Decode the raw bytes so it becomes a tensor with type.
image = tf.decode_raw(image_raw, tf.uint8)
# The type is now uint8 but we need it to be float.
image = tf.cast(image, tf.float32)
label = parsed_example['label']
return image, label
Code used to create the TFRecord files
with tf.python_io.TFRecordWriter(out_path) as writer:
data = \
{
'image': wrap_bytes(img_bytes),
'label': wrap_int64(label)
}
# Wrap the data as TensorFlow Features.
feature = tf.train.Features(feature=data)
# Wrap again as a TensorFlow Example.
example = tf.train.Example(features=feature)
# Serialize the data.
serialized = example.SerializeToString()
# Write the serialized data to the TFRecords file.
writer.write(serialized)
def wrap_int64(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def wrap_bytes(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (2 by maintainers)
I had this same issue running the Keras OCR example on my own data. Turns out I wasn’t accounting for the variable length labels in my data set. When I pad my labels to
max_length
it all works out.@FalsoMoralista I put my code in this Colab notebook.
The error comes from the .batch(batch_size) part:
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = (train_dataset.map(encode_single_sample, num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(batch_size).prefetch(buffer_size=tf.data.experimental.AUTOTUNE))
I iterate through the dataset in the plotting cell right after that:
for batch in train_dataset.take(1):
Interestingly enough, when I changed my batch_size to 1, it works (but well that batch_size comes with some another problems). In your case, it’s the number ‘32’ within
parsed_dataset = dataset.map(parsing_fn).batch(32)
Worth to mention that I’m following Keras’ OCR example here and I ran their code just fine.
Thank you! This was the issue. I’m also following the OCR tutorial and ran into this issue.
It worked for me. Thanks
Worked for me, Thanks a lot 👍