DeepSpeed
d9e12d3a - Fix attention mask handling in the Hybrid Engine Bloom flow (#5101)

Commit
1 year ago
Fix attention mask handling in the Hybrid Engine Bloom flow (#5101) The Bloom flow in Hybrid Engine applies the same transformation of the input mask which is already performed earlier by the transformers BloomModel::forward. This results in the non-convergence of scores, specifically in Deepspeed Chat on different accelerators, including CUDA and HPU. The fix removes redundant mask transformation and application, producing correct convergence. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
Author
Parents
Loading