OpenNMT-py: Maybe exist some bug in attn_debug function in vesion 3,
I used same data to train opennmt-py(transformer) model by using opennmt-py-v3 and opennmt-py-v2 respectively.
For opennmt-py-v2, I can easily get the attention matrix and visualize it by -attn_debug. The attention weights are almost on a diagonal line because my output is highly related(similar) to the input.
However for opennmt-py-v3, I tried same way to visualize it and find a small bug (about return attn , which I talked in forum ). All the attention weights almost focus on the first column. Is there still some bug ? ( Maybe related to the start token, I guess, because most attention like to focused on the first column. Maby related to the context_attention or relative_position in MultiHeadedAttention, I’m not sure). But the generated output is the same between v2 and v3.
Looking forward to your help! By the way, if there some script to convert v3 model to v2 model?
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 18 (8 by maintainers)
you can git pull and see how it goes.