keras: keras 2 - fit_generator broken?

I’ve updated to keras v2 yesterday.

I adapted all my code from version 1 to the new API, following all the warnings I encountered.

However I’m having some very strange problems with fit_generator method of Model.

Using this toy example, wich worked totally fine in version 1:

from keras.models import Model
from keras.layers import Input, Dense, Flatten
from keras.optimizers import SGD
from keras.losses import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator

gen = ImageDataGenerator()
train_batches = gen.flow_from_directory("D:/GitHub/Kaggle/redux/train/")

inp = Input(shape=(256,256,3))
l1 = Flatten()(inp)
out = Dense(2, activation="softmax")(l1)

model = Model(inp, out)

model.compile(loss=categorical_crossentropy, optimizer=SGD(lr=0.01))

model.fit_generator(train_batches, train_batches.samples / train_batches.batch_size)

The output in jupyter notebook is quite strange, printing a unknown symbol until the notebook crashes:

Epoch 1/1
   23/718 [..............................] - ETA: 522s - loss: 8.4146 

Running the code from the terminal don’t print those strange symbols.

The code works perfect when manually getting the batches from the generator to use with model.fit:

n = 0
for imgs, labels in train_batches:
    if n > 3:
        break
    X_train = np.array(imgs)
    y_train = np.array(labels)
    model.fit(X_train, y_train)
     n += 1
Epoch 1/1
   32/32 [==============================] - 0s - loss: 7.5555
   Epoch 1/1
   32/32 [==============================] - 0s - loss: 8.5627
   Epoch 1/1
   32/32 [==============================] - 0s - loss: 6.5480
   Epoch 1/1
   32/32 [==============================] - 0s - loss: 10.0738

Anyone is facing similar problems with fit_generator and/or know something about it?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 25 (7 by maintainers)

Most upvoted comments

For those who are looking at this issue now.

My solution: switch both steps_per_epoch and validation_steps to be the no Of samples/batch size.

I believe this is what @fchollet mentioned in his response. For me verbose=1 worked fine.

so my batch size is 32 , no of samples was 8000 and validation set size was 2000

As per the keras documentation the parameter steps_per_epoch = no of train samples/batch_size and validation_steps = no of validation samples/batch_size

  1. steps_per_epoch = 8000/32 = 250

  2. validation_steps = 2000/32 = 62.5

Following code tested on the day of this comment. 08 Sep 2017.

Old Code classifier.fit_generator(training_set, samples_per_epoch=8000, epochs=25, verbose=1, validation_data=test_set, validation_steps=2000)

Changed Code classifier.fit_generator(training_set, steps_per_epoch=250, epochs=25, verbose=1, validation_data=test_set, validation_steps=62.5)

You should all note that generator methods have switched from being sample-based (one epoch = defined number of samples) to being step-based (one epoch = defined number of batches). The conversion is handled automatically when possible, but if you are using custom generators then that may not be possible.

For more info, see the release notes:

https://github.com/fchollet/keras/wiki/Keras-2.0-release-notes

On 22 March 2017 at 12:36, jerpint notifications@github.com wrote:

Hello, Im not sure if this is the right place to ask, but hopefully someone can help. Im having some issues with fit_generator() in keras V1.

It seems to work well on the first epoch , but not on the epochs afterwards. I say this because on the first epoch, the model takes a significant amount of train time, and returns accuracy metrics that seem plausible. However, epoch 2 onwards, the train time decreases significantly, and accuracy shoots up to 1 ( obviously suspicious). It seems as though the generator doesn’t reset appropriately. Does anyone know what could be causing this?

My code :

def batch_generator_train():

from keras.utils import np_utils

global f_train
dset_train = f_train['urbansound']
global batch_size
global count_train
global meta_info_train
global nb_classes
idx = range(0,count_train)
np.random.shuffle(idx)
count=0
while 1:
    idx_tmp = idx[count*batch_size:(count+1)*batch_size]
    X_train = np.zeros((batch_size,128,128,1))
    y_train = np.zeros(batch_size)
    #y_meta_train_all = []
    for ii,jj in enumerate( idx_tmp ):
        X_train[ii,:,:,0] = dset_train[jj]
        y_train[ii] = meta_info_train[jj][6]
        #y_meta_train_all.append( meta_info_train[jj])
    Y_train = np_utils.to_categorical(y_train, nb_classes)
    yield X_train,Y_train
    count=count+1

def batch_generator_val():

from keras.utils import np_utils

global f_val
dset_val = f_val['urbansound']
global batch_size
global count_val
global meta_info_valid
global nb_classes
idx = range(0,count_val)
np.random.shuffle(idx)
count=0
while 1:
    idx_tmp = idx[count*batch_size:(count+1)*batch_size]
    X_val = np.zeros((batch_size,128,128,1))
    y_val = np.zeros(batch_size)
    #y_meta_train_all = []
    for ii,jj in enumerate( idx_tmp ):
        X_val[ii,:,:,0] = dset_val[jj]
        y_val[ii] = meta_info_valid[jj][6]
        #y_meta_train_all.append( meta_info_train[jj])
    Y_val = np_utils.to_categorical(y_val, nb_classes)
    yield X_val,Y_val
    count=count+1

and my network definitions

f_train = h5py.File(“/home/jerpint/Desktop/Audiostuff/aug/Xtrain.h5”, “r”) f_val = h5py.File(“/home/jerpint/Desktop/Audiostuff/aug/Xvalid.h5”, “r”)

generator_train = batch_generator_train() generator_val = batch_generator_val()

callbacks

filepath = ‘test2_callback_audio.hdf5’ checkpoint = ModelCheckpoint(filepath, monitor=‘val_acc’, verbose=0, save_best_only=True, mode=‘max’) callbacks_list = [checkpoint]

#count_val = number of validation samples, count_train = number of train samples history = model.fit_generator(generator= generator_train,samples_per_epoch= int(np.floor((count_train)/batch_size)*batch_size),nb_epoch=5,verbose=2,validation_data=generator_val, nb_val_samples = int(np.floor((count_val)/batch_size)*batch_size))#,callbacks = callbacks_list

score = model.evaluate(X_test, Y_test, verbose=0) print(‘Test score:’, score[0]) print(‘Test accuracy:’, score[1]) f_train.close() f_val.close()

this, in turn, returns

75584 test samples Epoch 1/5 7412s - loss: 0.8559 - acc: 0.7073 - val_loss: 1.3755 - val_acc: 0.6275 Epoch 2/5 435s - loss: 4.5010e-04 - acc: 0.9999 - val_loss: 1.1921e-07 - val_acc: 1.0000 Epoch 3/5 437s - loss: 3.4126e-06 - acc: 1.0000 - val_loss: 1.1921e-07 - val_acc: 1.0000 Epoch 4/5 437s - loss: 1.8840e-06 - acc: 1.0000 - val_loss: 1.1921e-07 - val_acc: 1.0000 Epoch 5/5 437s - loss: 1.6184e-06 - acc: 1.0000 - val_loss: 1.1921e-07 - val_acc: 1.0000 Test score: 7.35714074846 Test accuracy: 0.40946496613

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/5818#issuecomment-288514416, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb7Qln6FC7vgu80IhqWGSi02FZTXkks5roXg6gaJpZM4MfNie .

I’m also having a problem with fit_generator after upgrading to Keras2. The model training time has gone up about 1000 times! Have not figured out yet why. I read that in Keras2 fit_generator number of samples has been replaced by the number of batches. I suspect this is the cause of the issue but don’t know for sure.

Ok so:

  • About loading images 1 by 1. The problem was that I was used to keras v1 where the number printed were the number of images. Now is the number of steps. the slowniness is just caused by the overhed of printing that strange symbol.

  • Setting verbose=0 on fit_generator avoids printing this strange thing at cost of not printing shit.

Lastly, this is a little embarrasing, but I closed the issue by mistake when closing issues from my repos. xD