Sparse CSR CUDA: add `torch.add` with all inputs sparse (#63948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63948
This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b,
out` all being sparse CSR tensors.
The underlying cuSPARSE function works only with 32-bit indices, and in
the current implementation, the result tensor has 32-bit indices. Input
tensors can have both 64-bit and 32-bit indices tensors.
Fixes https://github.com/pytorch/pytorch/issues/59060
cc nikitaved pearu cpuhrsch IvanYashchuk ngimel
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D31909731
Pulled By: cpuhrsch
fbshipit-source-id: 656f523e3947fec56b2f93c474fb6fd49f0360ca