llvm
da04975e - [compiler] Do not mix kernels with different sub-group sizes.

Commit
241 days ago
[compiler] Do not mix kernels with different sub-group sizes. Per OpenCL 3.0 API 3.2.1 Mapping Work-items Onto an Nd-range, all sub-groups within a work-group will be the same size, apart from the sub-group with the maximum index which may be smaller if the size of the work-group is not evenly divisible by the size of the sub-groups. We were not meeting this requirement: in cases where we would not or could not generate a predicated vectorized kernel, we would execute the scalar kernel in a loop for any remaining work items, possibly resulting in multiple sub-groups that are smaller than the maximum sub-group size. To avoid this situation, we need to avoid mixing vector and scalar kernels if those kernels use different sub-group sizes. If we can handle all items with vector kernels, possibly with predication, continue to do so. If the vector and scalar kernels do not depend on the sub-group size, also continue to handle this as before. If the vector and scalar kernels do depend on the sub-group size, and the vector kernel cannot handle all work items, we need to switch to the scalar kernel for all work items. This includes a small optimization where if we know the kernel does not use sub-group information, we avoid setting sub-group IDs. This includes one change to createLoop which permits nullptr PHIs. They will be skipped over, and are useful since PHIs must be referred to by index in the callback function. This allows indices to be constant even when the caller has multiple optional PHIs. This also includes one bugfix to ControlFlowConversionPass to fix a crash seen now, where we use the result of createMasked{Load,Store} before checking whether it succeeded. This also includes one improvement to CompileKernelToBin.cmake. If the executed command fails, it will now be printed in a format that can be copied and pasted.
Author
Committer
Colin Davidson
Parents
Loading