pytorch
effd2709 - Fuse row-wise sharded linear matmul to increase perf.

Commit

2 years ago

Fuse row-wise sharded linear matmul to increase perf. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78449 Instead of looping through and performing a matmul separately, we can just perform a single matmul to ensure we launch a single cuda kernel for this operation. Differential Revision: [D36743354](https://our.internmc.facebook.com/intern/diff/D36743354/) Approved by: https://github.com/aazzolini, https://github.com/wanchaol

Author

pritamdamania87

Committer

pytorchmergebot

Parents

93d5a722

pytorch effd2709 - Fuse row-wise sharded linear matmul to increase perf.

pytorch
effd2709 - Fuse row-wise sharded linear matmul to increase perf.