llvm-project
69589dd2 - AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)

Commit
151 days ago
AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818) These shuffles can always be implemented using v_perm_b32, and so this rewrites the analysis from the perspective of "how many v_perm_b32s does it take to assemble each register of the result?" The test changes in Transforms/SLPVectorizer/reduction.ll are reasonable: VI (gfx8) has native f16 math, but not packed math.
Author
Parents
Loading