DeepSpeed
[Bug Fix] Support threads_per_head < 64 for wavefront size of 64
#6622
Merged

Loading