Create decoder class (#79438)
Summary: Implement decoder-only layer, which doesn't currently exist in torch. Implement incremental and forced decoding with a new multiheadattention and decoder forward pass. Rather similar to the transformer_encoder_layer_fwd. Not a public facing API, although may become public facing eventually.
Stacked on top of https://github.com/pytorch/pytorch/pull/79437.
Test Plan: See D36140513 numerical tests.
Reviewed By: mikekgfb
Differential Revision: D36987004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79438
Approved by: https://github.com/zrphercule