benchmark
0ec9f60e - Add TMA+persistent bf16 gemm (#2373)

Commit
1 year ago
Add TMA+persistent bf16 gemm (#2373) Summary: Add persistent and TMA-persistent matmul variants from the persistent matmul tutorial: https://github.com/triton-lang/triton/blob/main/python/tutorials/09-persistent-matmul.py. Note that these aren't autotuned, so we might get bad results for small or odd shapes. Also add a "TMA cached" variant, which caches the TMA descriptors during benchmarking, to avoid measuring the HtoD overhead of setting up the TMA. Pull Request resolved: https://github.com/pytorch/benchmark/pull/2373 Reviewed By: manman-ren, chenyang78 Differential Revision: D59648079 Pulled By: bertmaher fbshipit-source-id: 4d1dca591bdde7709659339acfb0f6a952c5c02d
Author
Parents
Loading