caption_generator: Network does not converge, bad captions
Hello,
I’ve followed your instructions and started training the network. The loss reaches its minimum value after about 5 epochs and then it starts to diverge again.
After 50 epochs, the generated captions of the best epoch (5th or 6th) look like this:
Predicting for image: 992
2351479551_e8820a1ff3.jpg : exercise lamb Fourth headphones facing pasta soft her soft her soft her soft her soft her dads college soft her dads college soft her her her her her soft her her her her her soft her her her her
Predicting for image: 993
3514179514_cbc3371b92.jpg : fist graffitti soft her soft her Hollywood Fourth Crowd soft her her soft her her her her her soft her her her her her her soft her her her her soft her her her her soft her her her
Predicting for image: 994
1119015538_e8e796281e.jpg : closeout security soft her soft her security fall soft her her her her her fall soft her her her her her her soft her her her her her soft her her her her soft her her her her her
Predicting for image: 995
3727752439_907795603b.jpg : roots college Fourth tree-filled o swing-set places soft her soft her her soft her her soft her her college soft her her her her her her her soft her her her her soft her her her her her her
Any idea what’s wrong?
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 34 (2 by maintainers)
It’s been a while since I worked on this repo. I’ll try to retrain it and reproduce this error sometime next week and see if something needs change.
Meanwhile, @PavlosMelissinos and @MikhailovSergei if you were able to debug this, feel free to update and send a pull request.
I am facing the same issue while using Flickr8k and the captions are not making any sense. Particular words are getting repeated in every sentence. Somehow, it is working better on a subset of 100 images rather than the entire dataset. I have tried changing the batch size but it didn’t help. Could you give any suggestions?