Disable incremental_state function in MultiheadAttention module. (#20177)
Summary:
To fully support incremental_state function, it requires several additional utils available in fairseq. However, we lack a problem for the unit test. Therefore, the incremental_state function will be disable for now. If it is needed in the future, a feature request could be created. Fixed #20132
Add some unit tests to cover the arguments of MultiheadAttention module, including bias, add_bias_kv, add_zero_attn, key_padding_mask, need_weights, attn_mask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20177
Differential Revision: D15304575
Pulled By: cpuhrsch
fbshipit-source-id: ebd8cc0f11a4da0c0998bf0c7e4e341585e5685a