DeepSpeed
2b41d621 - [Bug Fix] Support threads_per_head < 64 for wavefront size of 64 (#6622)

Commit
1 year ago
[Bug Fix] Support threads_per_head < 64 for wavefront size of 64 (#6622) When launching apply_rotary_pos_half kernel, only threads_per_head of 64 is supported for wavefront size of 64. This change adds support for threads_per_head < 64 such as 4, 8, 16. Fixes the issue introduced in https://github.com/microsoft/DeepSpeed/pull/5402 --------- Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Logan Adams <loadams@microsoft.com>
Author
Parents
Loading