benchmark
6b4f3393 - Add non-persistent fp8 triton_rowwise kernel (#2484)

Commit
1 year ago
Add non-persistent fp8 triton_rowwise kernel (#2484) Summary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2484 X-link: https://github.com/pytorch/FBGEMM/pull/3212 X-link: https://github.com/facebookresearch/FBGEMM/pull/308 triton_rowwise persistent kernel performs poorly on MI300 compared to the non-persistent kernel, when both are run with exhaustive AMD-specific tuning. Reviewed By: htyu Differential Revision: D63741099 fbshipit-source-id: c276415ddf8f5d24ffeba70b8ee6493011b393e1
Author
Parents
Loading