tl.dot(a,b, trans_b=True) is not supported by triton2.0+ , updating this api (#4541)
Stable Diffusion Inference with deepspeed inference engine works with
this update with triton2.1 on A100
---------
Co-authored-by: Logan Adams <loadams@microsoft.com>