pytorch
066e3ed9 - Re-apply "[bert/RoBERTa] Optimize LayerNorm with explicit vectorization using Vec256" (#31127)

Commit View On GitHub

Commit

4 years ago

Re-apply "[bert/RoBERTa] Optimize LayerNorm with explicit vectorization using Vec256" (#31127) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31127 Original commit changeset: d22448b90843 On Skylake T6: Single Core: (Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.) - Before the PR: ``` native_layer_norm 0.81% 5.884ms 0.81% 5.884ms 122.580us NaN 0.000us 0.000us 48 [[47, 1, 1024], [1024], [1024]] ``` - After the PR: ``` native_layer_norm 0.68% 5.053ms 0.68% 5.053ms 105.272us NaN 0.000us 0.000us 48 [[56, 1, 1024], [1024], [1024]] ``` 20 Cores: - Before the PR: ``` native_layer_norm 1.65% 41.682ms 1.65% 41.682ms 868.365us NaN 0.000us 0.000us 48 [[61, 64, 1024], [1024], [1024]] ``` - After the PR: ``` native_layer_norm 1.34% 33.829ms 1.34% 33.829ms 704.771us NaN 0.000us 0.000us 48 [[61, 64, 1024], [1024], [1024]] ``` ghstack-source-id: 95420889 Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" buck test mode/dev-nosan //caffe2/test:nn -- "test_LayerNorm_1d_no_elementwise_affine_eval" python run_test.py -i nn -- TestNN.test_LayerNorm_1d_no_elementwise_affine_eval Differential Revision: D18936428 fbshipit-source-id: 8cae33d35fb338b5ac49b1597c2709152612d6e5

Author

jianyuh

Committer

facebook-github-bot

Parents

66f2bba8

pytorch 066e3ed9 - Re-apply "[bert/RoBERTa] Optimize LayerNorm with explicit vectorization using Vec256" (#31127)

Commit

pytorch
066e3ed9 - Re-apply "[bert/RoBERTa] Optimize LayerNorm with explicit vectorization using Vec256" (#31127)