eyenet: Not getting enough accuracy

When I tried to train on the whole dataset which of almost 20Gb’s then I ran out of Memory! so I split the dataset into 4 batches each contain 26600 images i.e 26600 + 26600 + 26600 + 26586 = 106,386. And so I have to split the dataset into 4 batches and make a slight adjustment in the code.

To load the save model with trained weights from the previous batch I use the keas load_weight() method. Here, I’m working on cnn.py for all 5 classes

model.add(Dense(nb_classes, activation = 'softmax'))
model.summary()

model.load_weights(model_name + '.h5' )

model.compile(loss = 'categorical_crossentropy', optimizer = 'adam',
                    metrics = ['accuracy'])

When I train on the first batch which contain 26600 images I got

loss: 1.0042 - acc: 0.6248 - val_loss: 1.0625 - val_acc: 0.6029

For the second batch of 26600 images I got

loss: 0.9026 - acc: 0.6563 - val_loss: 1.1008 - val_acc: 0.6114

For the third batch of 26600 images I got

loss: 0.8860 - acc: 0.6666 - val_loss: 0.9988 - val_acc: 0.6330

For the fourth batch of 26586 images I got

loss: 0.8227 - acc: 0.6888 - val_loss: 1.0289 - val_acc: 0.6356

Question 1: If you see there is not a much significant change in the score. Can you identify where’s the problem is occurring? If you want then I can provide you the code which I have slightly altered from the original.

Question 2: As I have split the dataset into individual .npy arrays could this be a reason for not seeing much improvement in the score?

Question 3; As you mentioned in previous issues that you train on p2.8x large AWS instance. If I train on the same instance how much time does it takes to train the whole network?

Question 4: You have also mentioned that you use the VGG arch but the VGG contain more layers then you have used in cnn.py OR cnn_multi.py could it be the reason that model is not extracting enough feature to learn?

Question 5: When I try to train the cnn.py for binary classification on the first batch which contains 26600 images then I got the 99% accuracy after epoch which shows that model is obviously overfitting. Again, As I have split the dataset into individual arrays could this be the reason for getting 99% accuracy?

O/p after first epoch using binary classification :

loss: 0.0088 - acc: 0.9934 - val_loss: 8.1185e-05 - val_acc: 1.0000

Thanks! Please do answer Sir! 😃

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 21 (3 by maintainers)

Most upvoted comments

I’m wondering if something’s changed in the TensorFlow architecture since posting the results on the README. When this happens repeatedly, it’s for one of two reasons: either a step was undocumented, or the TensorFlow architecture has changed. I’ll look into both, and see what can be done.

gregwchase on Dec 25, 2018

@Ranjan-mn As I was trying to load the 20Gb’s of .npy file into RAM but when cnn.py converts the array into float32 I ran out of memory as it requires more than 61Gb’s of RAM to hold the 20Gb’s of float32 array. So, now I have to either opt for AWS or GCP with higher RAM configurations to train the whole network at once. I suggest you, use Transfer learning on either VGG16 or Inception-v3 as it will help to improve accuracy. Link For Transfer Learning Example

Tirth27 on Dec 25, 2018

I used the compute engine with 16 CPUs and trained without GPUs. Followed the same preprocessing methods and used the same model as in cnn.py.

Ranjan-mn on Dec 25, 2018

Hey, even I am facing the similar problem. While training the model starts with 0.52 acc then doesn’t increase, after 3 epochs, it calls earlystopping() and accuracy stops at 0.52 with recall 1. The model was trained on GCP and followed every step as in the project.

Ranjan-mn on Dec 25, 2018

@Tirth27 Answers to questions are below.

1 & 2. If you’re training four models on four separate batches, this is why. The .npy arrays have to be combined and run together.

It should take roughly 30-40 minutes to train.
I used something similar to VGG, but not exact. I followed the idea of multiple layers, then pool, followed by multiple layers, etc.
Per the answers in 1 & 2: you need to combine all of the arrays together, and train a single model.

gregwchase on Dec 17, 2018