Sparse CSR CUDA: Add torch.baddbmm and torch.bmm (#68711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68711
This PR adds possibility to multiply a single CSR matrix by a batch of dense matrices.
cc nikitaved pearu cpuhrsch IvanYashchuk ngimel
Test Plan: Imported from OSS
Reviewed By: davidberard98
Differential Revision: D33773319
Pulled By: cpuhrsch
fbshipit-source-id: 1623ce9affbc4fdc6d6130a95c5a42022858b62b
(cherry picked from commit 628c8e366d6325fed631edfbe9a35d130c529344)