add/add_ for compressed sparse inputs: bypass BLAS in some trivial cases (#95293)
In `add(self, other, out=...)` we can bypass calls to BLAS in cases when `self == other == out` and `self == other`.
This PR fixes the repro from https://github.com/pytorch/pytorch/issues/94966, but the issue is still present when `x.add_(x)` is replaced, say, with `x = x.clone().add_(x)`.
Could that be a synchronization issue? CC @IvanYashchuk .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95293
Approved by: https://github.com/cpuhrsch