DeepSpeed
d9e12d3a - Fix attention mask handling in the Hybrid Engine Bloom flow (#5101)

Comment changes are shownComment changes are hidden
Commit
1 year ago
Fix attention mask handling in the Hybrid Engine Bloom flow (#5101) The Bloom flow in Hybrid Engine applies the same transformation of the input mask which is already performed earlier by the transformers BloomModel::forward. This results in the non-convergence of scores, specifically in Deepspeed Chat on different accelerators, including CUDA and HPU. The fix removes redundant mask transformation and application, producing correct convergence. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
Author
Parents
  • deepspeed
    • module_inject/containers
      • File
        bloom.py
    • ops/transformer/inference
      • File
        config.py
      • File
        ds_attention.py