optimize `to_dense` for CSC (#79635)
As per title. Previously it was done via converting to COO.
A better approach could be using `dense.out_`, but `sparse_csc` is yet forbidden.
And are we fine with implementing very critical operations like `add` via transpositions?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79635
Approved by: https://github.com/cpuhrsch