llvm-project
69589dd2 - AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)

Commit

151 days ago

AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818) These shuffles can always be implemented using v_perm_b32, and so this rewrites the analysis from the perspective of "how many v_perm_b32s does it take to assemble each register of the result?" The test changes in Transforms/SLPVectorizer/reduction.ll are reasonable: VI (gfx8) has native f16 math, but not packed math.

References

#168818 - AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles

Author

nhaehnle

Parents

52f9a57b

llvm-project 69589dd2 - AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)

llvm-project
69589dd2 - AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)