pytorch
7b933cd9 - configurable pre/post LayerNorm in nn.Transformer (#60593)

Commit View On GitHub

Commit

3 years ago

configurable pre/post LayerNorm in nn.Transformer (#60593) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60593 Per #55270, this PR makes it configurable whether to run LayerNorm before or after other operations in Transformer layers. However, it leaves for a separate PR the removal of the LayerNorm performed after the final encoder/decoder layer has run, which is redundant when LayerNorms has been run after other in-layer operations (problem described in #24930 #50086 #51447). Note: this means that transformers built with `nn.Transformer()` are now configurable, but will still contain a redundant LayerNorm when configured as before. However, callers of the `TransformerEncoder` and `TransformerDecoder` classes have always been able to avoid this redundancy. Reviewer notes: 1. Ran across this during other work, don't know if anybody's working on it already (most recent conversation in issues seems to be from early April). Happy to abandon if so. 2. Was looking for a quick way to add tests but it looks like the existing ones in test_nn just compare against snapshots. I could add something similar, but curious if there's any prepackaged way to add a test that LayerNorm-first (the new option) yields model that trains properly, etc. 3. New code in the `forward`s was written to minimize diff churn rather than maximize beauty :P happy to pretty it up if desired. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29356590 Pulled By: bhosmer fbshipit-source-id: 308669326990b8923aab5fcd96e03b582fb21f24

Author

Basil Hosmer

Committer

facebook-github-bot

Parents

e13a9587

pytorch 7b933cd9 - configurable pre/post LayerNorm in nn.Transformer (#60593)

Commit

pytorch
7b933cd9 - configurable pre/post LayerNorm in nn.Transformer (#60593)