BiDAF: Very small gradients causing no weight update from your model
Thanks for your code. It helps me understand the BiDAF in details.
However, I found the model had no performance increasing.
Every epoch, the metric is always the same.
And then, I found it’s the optimized gradients too small.
it’s the order of 10^-3~10^-8.
I can’t find what’s wrong. And I think your code is good to understand. So, what may be the problem?
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 31 (19 by maintainers)
Emm, I am trying to wrap the code into tensorboard. So I can compare with the keras training log, to have a more clear knowledge. By the way, I have a deadline recently. So I can’t spend all my time to solve this. But If I have some improvement, I will tell you.
On Thu, Dec 7, 2017 at 3:02 AM, Junki Ohmura notifications@github.com wrote: