benchmark
53d98e37 - Triton Partition K gemm to TritonBench

Commit
1 year ago
Triton Partition K gemm to TritonBench Summary: This an early exploration. Triton Partition K ([link](https://github.com/NVIDIA/cutlass/blob/main/media/docs/efficient_gemm.md#parallelized-reductions)) used two kernels (gemm + reduce) to achieve the goal of splitK. Comparing with the `atomic_add` Triton GEMM, the partitionK is more friendly to epilogue fusion Reviewed By: bertmaher, chenyang78 Differential Revision: D59948589 fbshipit-source-id: a2118947f8e20ab17d26843fd263b83e22f58541
Author
Parents
Loading