Refactor out QKV in projection (#79437)
Summary: Refactor to reduce amount of copied code for decoder by finding common chunks for encoder and decoder. QKV in projection is a reasonable unit to copy out.
Test Plan:
buck run mode/opt -c fbcode.platform=platform010 -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=a100 //pytext/fb/tools:benchmark_transformers -- transformer --batch-size 64 --avg-sequence-length 235 --max-sequence-length 256 --iters 100 --module native
Benchmark and numerical tests work fine.
Reviewed By: mikekgfb
Differential Revision: D36138504
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79437
Approved by: https://github.com/jbschlosser