[inductor] Parallelize Max Autotune step 1: refactor autotune_process (#109126)
Summary: Step 1 in revamping subprocess autotune to support multiple GPUs. This diff just does some refactoring to autotune_process.py in order to prepare for the next diff:
* Move all logic for managing the sub-process (like detecting sub-process crashes) into the TuningProcess class.
* Use log.debug statements instead of print statements
Test Plan: python test/inductor/test_max_autotune.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109126
Approved by: https://github.com/shunting314, https://github.com/eellison