Fix attention mask handling in the Hybrid Engine Bloom flow (#5101)
The Bloom flow in Hybrid Engine applies the same transformation of the
input mask which is already performed earlier by the transformers
BloomModel::forward.
This results in the non-convergence of scores, specifically in Deepspeed
Chat on different accelerators, including CUDA and HPU.
The fix removes redundant mask transformation and application, producing
correct convergence.
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>