keras: val very slow when use fit_generator
I want to finetune ResNet-50 on my dataset. But I face the problem that when one epoch end and start to run val set, it become really slow, the val time even longer than train time, I’m not sure what happened. here is part of my code:
train_datagen = ImageDataGenerator(
rescale=1./255,
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=20, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False,
zoom_range=0.1,
channel_shift_range=0.,
fill_mode='nearest',
cval=0.,
)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
'/home/amanda/anaconda2/envs/tensorflow/lib/python2.7/site-packages/keras/datasets/nuclear/CRCHistoPhenotypes_2016_04_28/cropdetect/train',
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
'/home/amanda/anaconda2/envs/tensorflow/lib/python2.7/site-packages/keras/datasets/nuclear/CRCHistoPhenotypes_2016_04_28/cropdetect/val',
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical')
model.fit_generator(train_generator,
# steps_per_epoch=X_train.shape[0] // batch_size,
samples_per_epoch=35946,
epochs=epochs,
validation_data=validation_generator,
verbose=1,
nb_val_samples=8986,
callbacks=[earlyStopping,saveBestModel,tensorboard])
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 26
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Please reopen this issue I am still facing it on Keras 2.2.2
I’m having the same issue. using fit_generator, my validation step is significantly longer than my training step, even though it has fewer steps.
I am having a similar problem where using
flow
is considerably faster than usingflow_from_directory
for both training and validation and I can’t find a good reason to explain why. Would be grateful to get an insight from an expert in Keras 😃I’m still having this problem. It seems that the fit_generator method does not pay attention to the validation_steps parameter. I have set validation_steps at 15 but it is pulling len(data_generator) batches and ignoring this parameter value. As per comment by @Neutrino3316 above, it is the len(data_generator) method that matters to keep validation time down. And if this value is not extremely low, then the validation takes forever. Can we reopen this issue as it is not fixed!
I’m facing the same issue. Running fit_generator and I found that validation process after each epoch is incredibly slow. I ran the same process using my model to predict the whole validation set, and it is a lot of faster (like 20x faster). Any idea about what is happening would be nice @fchollet
I have the same problem now.And I think the reason of the problem is that the speed of ImageDataGenerator to load data from disk is too slow.I test on my server to load 50 batch data which each batch have 32 images. It cost near 50 second to load these data.Maybe you can test it on your server again to ensure the root of the problem.
I found the same thing, the len function on the generator needs to return a small number for the validation data generator (much smaller than for the training data generator) and then it becomes manageable. If both generators return the same length then validation is impossibly long - more than several hours in my case (I don’t know how long because I never waited long enough to see!)
[Also, this seems to be only a problem if workers>0 in the fit_generator method. If I set workers=0 then validation completes fine in a short time]
I had the same issue. I fixed it by following the instruction in this issues : https://github.com/fchollet/keras/issues/6406
You have to fixed the “steps_per_epoch” and “validation_steps” parameters correctly.
In the exemple of @SIAAAAAA, i think that uncoment the line steps_per_epoch=X_train.shape[0] // batch_size, and setting the validaiton_steps to X_val.shap[0] // batch_size should be enough.
It considerably improve the training time for me.
I have the same problem, in all epochs, the validation calculation is slower than the epoch training part.
The two phases take the same time with a Xeon with 18 core, but validation takes 4 times the training time with intel phi architeclure (tensorflow mkl binary).
I think/suspect that model evaluation for validation calculation does not take advantage of parallelization. This could be the core of the problem. Please check.
Regards
I met the same problem today. When
len(valid_generator)==500
, it took me almost five minute to evaluate. When I changelen(valid_generator)
to 20, it took me less than 20 seconds.validation_steps
andbatch_size
doesn’t matter, it’slen(valid_generator)
that matter. Kind of wierd, I think. Because the validation time should be proportional tovalidation_steps
andbatch_size