pytorch
effd2709 - Fuse row-wise sharded linear matmul to increase perf.

Commit
2 years ago
Fuse row-wise sharded linear matmul to increase perf. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78449 Instead of looping through and performing a matmul separately, we can just perform a single matmul to ensure we launch a single cuda kernel for this operation. Differential Revision: [D36743354](https://our.internmc.facebook.com/intern/diff/D36743354/) Approved by: https://github.com/aazzolini, https://github.com/wanchaol
Committer
Parents
Loading