(non-batch) BSR/BSC to COO performance improvement. (#91389)
This PR improves the aforementioned conversions by reducing memory footprint and the number of kernels run, and also by removing the sync imposed by `at::where(condition)`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91389
Approved by: https://github.com/pearu, https://github.com/kit1980