llvm-project
7d086512 - [clang][NVPTX] Add support for mixed-precision FP arithmetic (#168359)

Commit
1 day ago
[clang][NVPTX] Add support for mixed-precision FP arithmetic (#168359) This change adds support for mixed precision floating point arithmetic for `f16` and `bf16` where the following patterns: ``` %fh = fpext half %h to float %resfh = fp-operation(%fh, ...) ... %fb = fpext bfloat %b to float %resfb = fp-operation(%fb, ...) where the fp-operation can be any of: - fadd - fsub - llvm.fma.f32 - llvm.nvvm.add(/fma).* ``` are lowered to the corresponding mixed precision instructions which combine the conversion and operation into one instruction from `sm_100` onwards. This also adds the following intrinsics to complete support for all variants of the floating point `add/fma` operations in order to support the corresponding mixed-precision instructions: - `llvm.nvvm.add.(rn/rz/rm/rp){.ftz}.sat.f` - `llvm.nvvm.fma.(rn/rz/rm/rp){.ftz}.sat.f` We lower `fneg` followed by one of the above addition intrinsics to the corresponding `sub` instruction. Tests are added in `fp-arith-sat.ll` , `fp-fold-sub.ll`, and `bultins-nvptx.c` for the newly added intrinsics and builtins, and in `mixed-precision-fp.ll` for the mixed precision instructions. PTX spec reference for mixed precision instructions: https://docs.nvidia.com/cuda/parallel-thread-execution/#mixed-precision-floating-point-instructions
Author
Parents
Loading