[inductor] Parallelize Max Autotune step 1: Use Popen (#107982)
Summary: Step 1 in revamping subprocess autotune to support multiple GPUs: use Popen to create a new process with an entry point we control so we don't reinterpret the toplevel script.
Test Plan: `python test/inductor/test_max_autotune.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107982
Approved by: https://github.com/eellison, https://github.com/shunting314