llvm-project
47e31088 - clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes (#185098)

Commit
55 days ago
clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes (#185098) These were assuming uniform work group sizes. Emit the v4 and v5 sequences to take the remainder group for the nonuniform case. Currently the device libs uses this builtin on the legacy ABI path with the same sequence to calculate the remainder, and fully implements the v5 path. If you perform a franken-build of the library with the updated builtin, the result is worse. The duplicate sequence does not fully fold out. However, it does not appear to be wrong. The relevant conformance tests still pass.
Author
Parents
Loading