llvm-project
47e31088 - clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes (#185098)

Commit

82 days ago

clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes (#185098) These were assuming uniform work group sizes. Emit the v4 and v5 sequences to take the remainder group for the nonuniform case. Currently the device libs uses this builtin on the legacy ABI path with the same sequence to calculate the remainder, and fully implements the v5 path. If you perform a franken-build of the library with the updated builtin, the result is worse. The duplicate sequence does not fully fold out. However, it does not appear to be wrong. The relevant conformance tests still pass.

References

#185098 - clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes

Author

arsenm

Parents

3e3c3ab6

llvm-project 47e31088 - clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes (#185098)

llvm-project
47e31088 - clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes (#185098)