Diffusers attention script update triton2.1 (#4573)
deepspeed/ops/transformer/inference/triton_ops.py updated from
https://github.com/openai/triton/blob/release/2.1.x/python/tutorials/06-fused-attention.py
Inference time (text to image) reduced 2.6 sec to 2.49 sec on A100
model : stabilityai_stable-diffusion-2
@jithunnair-amd @loadams @rraminen
IS_CAUSAL = False gives same image output as not using deepspeed
inference engine ,
IS_CAUSAL = True gives noise as output
---------
Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>