DeepSpeed
3b1cf1fd - Diffusers attention script update triton2.1 (#4573)

Commit
2 years ago
Diffusers attention script update triton2.1 (#4573) deepspeed/ops/transformer/inference/triton_ops.py updated from https://github.com/openai/triton/blob/release/2.1.x/python/tutorials/06-fused-attention.py Inference time (text to image) reduced 2.6 sec to 2.49 sec on A100 model : stabilityai_stable-diffusion-2 @jithunnair-amd @loadams @rraminen IS_CAUSAL = False gives same image output as not using deepspeed inference engine , IS_CAUSAL = True gives noise as output --------- Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com> Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Author
Parents
Loading