llvm-project
2f627c18 - [NVPTX] Support for dense and sparse MMA intrinsics with block scaling. (#163561)

Commit
157 days ago
[NVPTX] Support for dense and sparse MMA intrinsics with block scaling. (#163561) This change adds dense and sparse MMA intrinsics with block scaling. The implementation is based on [PTX ISA version 9.0](https://docs.nvidia.com/cuda/parallel-thread-execution/). Tests for new intrinsics are added for PTX 8.7 and SM 120a and are generated by `llvm/test/CodeGen/NVPTX/wmma-ptx87-sm120a.py`. The tests have been verified with ptxas from CUDA-13.0 release. Dense MMA intrinsics with block scaling were supported by @schwarzschild-radius.
Author
Parents
Loading