Triton Partition K gemm to TritonBench

Commit

1 year ago

Triton Partition K gemm to TritonBench Summary: This an early exploration. Triton Partition K ([link](https://github.com/NVIDIA/cutlass/blob/main/media/docs/efficient_gemm.md#parallelized-reductions)) used two kernels (gemm + reduce) to achieve the goal of splitK. Comparing with the `atomic_add` Triton GEMM, the partitionK is more friendly to epilogue fusion Reviewed By: bertmaher, chenyang78 Differential Revision: D59948589 fbshipit-source-id: a2118947f8e20ab17d26843fd263b83e22f58541

Author

sijiac

Committer

facebook-github-bot

Parents

43d8a999

benchmark 53d98e37 - Triton Partition K gemm to TritonBench

benchmark
53d98e37 - Triton Partition K gemm to TritonBench