Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707
This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks.
My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors.
cc nikitaved pearu cpuhrsch IvanYashchuk ngimel
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision: D32650366
Pulled By: cpuhrsch
fbshipit-source-id: 430a9627901781ee3d2e2496097b71ec17727d98