pytorch
b99a1653 - [AI Accelerators] softmax kernel for Nested Tensor (CPU) (#79756)

Commit

2 years ago

[AI Accelerators] softmax kernel for Nested Tensor (CPU) (#79756) Summary: Impl better softmax kernel for Nested Tensor CPU. Test Plan: Benchmark results: On CPU (command: buck run mode/opt -c fbcode.platform=platform009 //pytext/fb/tools:benchmark_transformers -- transformer --large --use-trt-kernel False --batch-size 16 --avg-sequence-length 64 --max-sequence-length 256 --iters 10 --use-real-data-distribution --module native --use-nt True --use-cpu True With mask (previous impl): NT: 4573.14 ms/iter, 0.14 TFLOP/s, Speedup: 2.33x; Without mask: NT: 3530.55 ms/iter, 0.18 TFLOP/s, Speedup: 1.51x Reviewed By: mikekgfb Differential Revision: D35679352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79756 Approved by: https://github.com/erichan1

Author

zrphercule2

Committer

pytorchmergebot

Parents

c9cbdb41

pytorch b99a1653 - [AI Accelerators] softmax kernel for Nested Tensor (CPU) (#79756)

pytorch
b99a1653 - [AI Accelerators] softmax kernel for Nested Tensor (CPU) (#79756)