More efficient indices validations for compressed sparse formats. (#81108)
As per title.
Some of the features:
- native kernels both for the CPU and CUDA without device syncs.
- If needed, invariant checks 5.1 - 5.5 could be improved to utilize vectorization. This will require implementing a conversion `Vectorized -> bool`. That's a follow-up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81108
Approved by: https://github.com/amjames, https://github.com/pearu, https://github.com/cpuhrsch