DeepSpeed
3b1cf1fd - Diffusers attention script update triton2.1 (#4573)

Commit

2 years ago

Diffusers attention script update triton2.1 (#4573) deepspeed/ops/transformer/inference/triton_ops.py updated from https://github.com/openai/triton/blob/release/2.1.x/python/tutorials/06-fused-attention.py Inference time (text to image) reduced 2.6 sec to 2.49 sec on A100 model : stabilityai_stable-diffusion-2 @jithunnair-amd @loadams @rraminen IS_CAUSAL = False gives same image output as not using deepspeed inference engine , IS_CAUSAL = True gives noise as output --------- Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com> Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

References

#4573 - Diffusers attention script update triton2.1

Author

bmedishe

Parents

8ad50cf0

DeepSpeed 3b1cf1fd - Diffusers attention script update triton2.1 (#4573)

DeepSpeed
3b1cf1fd - Diffusers attention script update triton2.1 (#4573)