keras: val very slow when use fit_generator

I want to finetune ResNet-50 on my dataset. But I face the problem that when one epoch end and start to run val set, it become really slow, the val time even longer than train time, I’m not sure what happened. here is part of my code：

train_datagen = ImageDataGenerator(
    rescale=1./255,
    featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=20,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False,
    zoom_range=0.1,
    channel_shift_range=0.,
    fill_mode='nearest',
    cval=0.,

)
test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    '/home/amanda/anaconda2/envs/tensorflow/lib/python2.7/site-packages/keras/datasets/nuclear/CRCHistoPhenotypes_2016_04_28/cropdetect/train',
    target_size=(224, 224),
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    '/home/amanda/anaconda2/envs/tensorflow/lib/python2.7/site-packages/keras/datasets/nuclear/CRCHistoPhenotypes_2016_04_28/cropdetect/val',
    target_size=(224, 224),
    batch_size=batch_size,
    class_mode='categorical')
model.fit_generator(train_generator,
                    # steps_per_epoch=X_train.shape[0] // batch_size,
                    samples_per_epoch=35946,
                    epochs=epochs,
                    validation_data=validation_generator,
                    verbose=1,
                    nb_val_samples=8986,
                    callbacks=[earlyStopping,saveBestModel,tensorboard])

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 26

Most upvoted comments

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

+26

stale[bot] on Jul 15, 2017

Please reopen this issue I am still facing it on Keras 2.2.2

+12

DavidWatkins on Mar 8, 2019

I’m having the same issue. using fit_generator, my validation step is significantly longer than my training step, even though it has fewer steps.

+12

DustinReagan on Aug 31, 2017

I am having a similar problem where using flow is considerably faster than using flow_from_directory for both training and validation and I can’t find a good reason to explain why. Would be grateful to get an insight from an expert in Keras 😃

+11

AhmadBaracat on Apr 15, 2017

I’m still having this problem. It seems that the fit_generator method does not pay attention to the validation_steps parameter. I have set validation_steps at 15 but it is pulling len(data_generator) batches and ignoring this parameter value. As per comment by @Neutrino3316 above, it is the len(data_generator) method that matters to keep validation time down. And if this value is not extremely low, then the validation takes forever. Can we reopen this issue as it is not fixed!

+10

keelinm on Dec 5, 2018

I’m facing the same issue. Running fit_generator and I found that validation process after each epoch is incredibly slow. I ran the same process using my model to predict the whole validation set, and it is a lot of faster (like 20x faster). Any idea about what is happening would be nice @fchollet

DanielTizon on Dec 2, 2017

I have the same problem now.And I think the reason of the problem is that the speed of ImageDataGenerator to load data from disk is too slow.I test on my server to load 50 batch data which each batch have 32 images. It cost near 50 second to load these data.Maybe you can test it on your server again to ensure the root of the problem.

justicevita on Apr 18, 2019

I found the same thing, the len function on the generator needs to return a small number for the validation data generator (much smaller than for the training data generator) and then it becomes manageable. If both generators return the same length then validation is impossibly long - more than several hours in my case (I don’t know how long because I never waited long enough to see!)

[Also, this seems to be only a problem if workers>0 in the fit_generator method. If I set workers=0 then validation completes fine in a short time]

keelinm on Sep 19, 2018

I had the same issue. I fixed it by following the instruction in this issues : https://github.com/fchollet/keras/issues/6406

You have to fixed the “steps_per_epoch” and “validation_steps” parameters correctly.

In the exemple of @SIAAAAAA, i think that uncoment the line steps_per_epoch=X_train.shape[0] // batch_size, and setting the validaiton_steps to X_val.shap[0] // batch_size should be enough.

It considerably improve the training time for me.

bguillouet on Dec 8, 2017

I have the same problem, in all epochs, the validation calculation is slower than the epoch training part.

The two phases take the same time with a Xeon with 18 core, but validation takes 4 times the training time with intel phi architeclure (tensorflow mkl binary).

I think/suspect that model evaluation for validation calculation does not take advantage of parallelization. This could be the core of the problem. Please check.

Regards

ftarlao on Oct 1, 2018

I met the same problem today. When len(valid_generator)==500, it took me almost five minute to evaluate. When I change len(valid_generator) to 20, it took me less than 20 seconds. validation_steps and batch_size doesn’t matter, it’s len(valid_generator) that matter. Kind of wierd, I think. Because the validation time should be proportional to validation_steps and batch_size

Neutrino3316 on Aug 31, 2018