bsr_dense_mm Triton kernel: fix out kwarg (#96648)
As per title. The kernel did not handle `out=` correctly and returned a different tensor which only shared storage with `out`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96648
Approved by: https://github.com/cpuhrsch