llvm-project
7b94b9ae - [libclc] Refine generic __clc_get_sub_group_size with fast full sub-group path (#188895)

Commit
22 days ago
[libclc] Refine generic __clc_get_sub_group_size with fast full sub-group path (#188895) Add a fast path for the common case that total work-group size is multiple of max sub-group size. The fallback path is ported from amdgpu/workitem/clc_get_sub_group_size.cl. Compiler can generate predicated instructions for the fallback path to avoid branches.
Author
Parents
Loading