[Inductor CUTLASS backend] Step 3: autotune_process, and CUDABenchmarkRequest (#107901)
This is the step 3 to add cutlass as an alternative inductor backend.
Full tests can be found from the last PR in the stack.
Feature request: https://github.com/pytorch/pytorch/issues/106991.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107901
Approved by: https://github.com/jansel, https://github.com/aakhundov, https://github.com/kadeng
ghstack dependencies: #107802, #107847