llvm-project
ea43a308 - [AMDGPU] Vectorize more 16 bit shuffles (#90648)

Commit

1 year ago

[AMDGPU] Vectorize more 16 bit shuffles (#90648) In the case of larger vectors, we should still prefer the vectorized version (i.e. shufflevector vs extract/insert chains). In arithmetic chains, vectorization results in chains of packed math instructions (as opposed to unpack/repack & scalarized arithmetic): https://godbolt.org/z/c5onaf6G5 In chains with PHIs, vectorization again removes the unnecessary pack / repack code around BBs: https://godbolt.org/z/vz7zYzvhs

References

#90648 - [AMDGPU] Vectorize more 16 bit shuffles

Author

jrbyrnes

Parents

f52d29c9

llvm-project ea43a308 - [AMDGPU] Vectorize more 16 bit shuffles (#90648)

llvm-project
ea43a308 - [AMDGPU] Vectorize more 16 bit shuffles (#90648)