tensorflow: Problem with read and get batch from 2d array tfrecords dataset

URL(s) with the issue:

https://www.tensorflow.org/tutorials/load_data/tfrecord#tfrecord_files_using_tfdata

Description of issue (what needs changing):

Problem with read and get batch from 2d array tfrecords dataset

Clear description

Hello. I use Tensorflow 2.0 version. I have some problems with reading Tfrecords file when get batch.

First, this is my read_tfrecords.py file.

import tensorflow as tf
import os
from glob import glob
import numpy as np


def serialize_example(batch, list1, list2):
    filename = "./train_set.tfrecords"
    writer = tf.io.TFRecordWriter(filename)

    for i in range(batch):
        feature = {}
        feature1 = np.load(list1[i])
        feature2 = np.load(list2[i])
        print('feature1 shape {} feature2 shape {}'.format(feature1.shape, feature2.shape)) 
        feature['input'] = tf.train.Feature(float_list=tf.train.FloatList(value=feature1.flatten()))
        feature['target'] = tf.train.Feature(float_list=tf.train.FloatList(value=feature2.flatten()))

        features = tf.train.Features(feature=feature)
        example = tf.train.Example(features=features)
        serialized = example.SerializeToString()
        writer.write(serialized)
        print("{}th input {} target {} finished".format(i, list1[i], list2[i]))



list_inp = sorted(glob('./input/2d_magnitude/*'))
list_tar = sorted(glob('./target/2d_magnitude/*'))


print(len(list_inp))
serialize_example(len(list_inp), list_inp, list_tar)

My input and target shapes are 2d array (Material of dataset is spectrogram). Therefore, my Tfrecords file includes two features likes [number_of_dataset, x, y]. About 100,000 dataset was successfully saved as Tfrecords file.

And I have problem when I read Tfrecords file to get batch. This is my code read_tfrecords.py:

import tensorflow as tf
import os
import numpy as np

shuffle_buffer_size = 50000
batch_size = 10
record_file = '/data2/dataset/tfrecords/train_set.tfrecords'

raw_dataset = tf.data.TFRecordDataset(record_file)
print('raw_dataset', raw_dataset) # ==> raw_dataset <TFRecordDatasetV2 shapes: (), types: tf.string>

raw_dataset = raw_dataset.repeat()
print('repeat', raw_dataset) # ==> repeat <RepeatDataset shapes: (), types: tf.string>

raw_dataset = raw_dataset.shuffle(shuffle_buffer_size)
print('shuffle', raw_dataset) # ==> shuffle <ShuffleDataset shapes: (), types: tf.string>

raw_dataset = raw_dataset.batch(batch_size, drop_remainder=True)
print('batch', raw_dataset) # ==> batch <BatchDataset shapes: (10,), types: tf.string>

raw_example = next(iter(raw_dataset)) 

parsed = tf.train.Example.FromString(raw_example.numpy()) # ==> read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted

print('parsed', parsed) # ==> ''

input = parsed.features.feature['input'].float_list.value
print('input', input) # ==> []
target = parsed.features.feature['target'].float_list.value
print('target', target) # ==> []

Here are results from code:

raw_dataset <TFRecordDatasetV2 shapes: (), types: tf.string>
repeat <RepeatDataset shapes: (), types: tf.string>
shuffle <ShuffleDataset shapes: (), types: tf.string>
batch <BatchDataset shapes: (10,), types: tf.string>
read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted
  parsed = tf.train.Example.FromString(raw_example.numpy())
parsed
input []
target []

As a result, I wonder how I get the batch from Tfrecords file to train. read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted Could you give advice? Thank you very much.

Usage example

Maybe…

raw_dataset = tf.data.TFRecordDataset(record_file)

raw_dataset = raw_dataset.repeat()

raw_dataset = raw_dataset.shuffle(shuffle_buffer_size)

raw_dataset = raw_dataset.batch(batch_size, drop_remainder=True)

raw_example = next(iter(raw_dataset)) 

parsed = tf.train.Example.FromString(raw_example.numpy())

input = parsed.features.feature['input'].float_list.value
target = parsed.features.feature['target'].float_list.value

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 20 (10 by maintainers)

Most upvoted comments

It looks like you are using FixedLenFeature to parse your features, but the features are sequences not scalars, so you need to use FixedLenSequenceFeature instead.

aaudiber on May 4, 2020