eyenet: Not getting enough accuracy
When I tried to train on the whole dataset which of almost 20Gb’s then I ran out of Memory! so I split the dataset into 4 batches each contain 26600 images i.e 26600 + 26600 + 26600 + 26586 = 106,386. And so I have to split the dataset into 4 batches and make a slight adjustment in the code.
To load the save model with trained weights from the previous batch I use the keas load_weight() method. Here, I’m working on cnn.py for all 5 classes
model.add(Dense(nb_classes, activation = 'softmax')) model.summary() model.load_weights(model_name + '.h5' ) model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
When I train on the first batch which contain 26600 images I got
loss: 1.0042 - acc: 0.6248 - val_loss: 1.0625 - val_acc: 0.6029
For the second batch of 26600 images I got
loss: 0.9026 - acc: 0.6563 - val_loss: 1.1008 - val_acc: 0.6114
For the third batch of 26600 images I got
loss: 0.8860 - acc: 0.6666 - val_loss: 0.9988 - val_acc: 0.6330
For the fourth batch of 26586 images I got
loss: 0.8227 - acc: 0.6888 - val_loss: 1.0289 - val_acc: 0.6356
Question 1: If you see there is not a much significant change in the score. Can you identify where’s the problem is occurring? If you want then I can provide you the code which I have slightly altered from the original.
Question 2: As I have split the dataset into individual .npy arrays could this be a reason for not seeing much improvement in the score?
Question 3; As you mentioned in previous issues that you train on p2.8x large AWS instance. If I train on the same instance how much time does it takes to train the whole network?
Question 4: You have also mentioned that you use the VGG arch but the VGG contain more layers then you have used in cnn.py OR cnn_multi.py could it be the reason that model is not extracting enough feature to learn?
Question 5: When I try to train the cnn.py for binary classification on the first batch which contains 26600 images then I got the 99% accuracy after epoch which shows that model is obviously overfitting. Again, As I have split the dataset into individual arrays could this be the reason for getting 99% accuracy?
O/p after first epoch using binary classification :
loss: 0.0088 - acc: 0.9934 - val_loss: 8.1185e-05 - val_acc: 1.0000
Thanks! Please do answer Sir! 😃
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 21 (3 by maintainers)
I’m wondering if something’s changed in the TensorFlow architecture since posting the results on the README. When this happens repeatedly, it’s for one of two reasons: either a step was undocumented, or the TensorFlow architecture has changed. I’ll look into both, and see what can be done.
@Ranjan-mn As I was trying to load the 20Gb’s of
.npyfile into RAM but whencnn.pyconverts the array intofloat32I ran out of memory as it requires more than 61Gb’s of RAM to hold the 20Gb’s of float32 array. So, now I have to either opt for AWS or GCP with higher RAM configurations to train the whole network at once. I suggest you, use Transfer learning on either VGG16 or Inception-v3 as it will help to improve accuracy. Link For Transfer Learning ExampleI used the compute engine with 16 CPUs and trained without GPUs. Followed the same preprocessing methods and used the same model as in cnn.py.
Hey, even I am facing the similar problem. While training the model starts with 0.52 acc then doesn’t increase, after 3 epochs, it calls earlystopping() and accuracy stops at 0.52 with recall 1. The model was trained on GCP and followed every step as in the project.
@Tirth27 Answers to questions are below.
1 & 2. If you’re training four models on four separate batches, this is why. The
.npyarrays have to be combined and run together.It should take roughly 30-40 minutes to train.
I used something similar to VGG, but not exact. I followed the idea of multiple layers, then pool, followed by multiple layers, etc.
Per the answers in 1 & 2: you need to combine all of the arrays together, and train a single model.