llvm-project
5897f276 - [AMDGPU] In promote-alloca, if index is dynamic, sandwich load with bitcasts to reduce excessive codegen (#171253)

Commit
4 days ago
[AMDGPU] In promote-alloca, if index is dynamic, sandwich load with bitcasts to reduce excessive codegen (#171253) Investigation revealed that scalarized copy results in a long chain of extract/insert elements which can explode in generated temps in the AMDGPU backend as there is no efficient representation for extracting subvector with dynamic index. Using identity bitcasts can reduce the number of extract/insert elements down to 1 and produce much smaller, efficient generated code. Credit: ruiling
Author
Parents
Loading