DeepSpeed
61095611 - Use fused addmm and eliminate eye allocation in Gram NS

Commit
24 days ago
Use fused addmm and eliminate eye allocation in Gram NS Replace separate scalar-multiply + matmul + add operations with single torch.addmm calls for Q and R updates, reducing kernel launch overhead. Remove torch.eye allocation by using diagonal().add_() instead. Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Author
Parents
Loading