BiDAF: Very small gradients causing no weight update from your model

Thanks for your code. It helps me understand the BiDAF in details.

However, I found the model had no performance increasing. Every epoch, the metric is always the same. And then, I found it’s the optimized gradients too small. it’s the order of 10^-3~10^-8.

I can’t find what’s wrong. And I think your code is good to understand. So, what may be the problem?

About this issue

Original URL
State: open
Created 7 years ago
Comments: 31 (19 by maintainers)

Most upvoted comments

Emm, I am trying to wrap the code into tensorboard. So I can compare with the keras training log, to have a more clear knowledge. By the way, I have a deadline recently. So I can’t spend all my time to solve this. But If I have some improvement, I will tell you.

On Thu, Dec 7, 2017 at 3:02 AM, Junki Ohmura notifications@github.com wrote:

So far, there are not improvements even I modified following items…

RNN -> LSTM

original loss function

use BiDAF’s script to build dataset

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jojonki/BiDAF/issues/1#issuecomment-349740879, or mute the thread https://github.com/notifications/unsubscribe-auth/ARNbsCNdzmMJQPpB2oxXYegMrYmozT0fks5s9uTPgaJpZM4Q0IWy .

oneTaken on Dec 8, 2017