[inductor] Disable parallel compile (#87048)
https://github.com/pytorch/pytorch/pull/87032 seems to have an issue that breaks our benchmark script, it might have to do with the benchmark script also using subprocess.
Before this PR:
```
$ ./benchmarks/dynamo/torchbench.py --performance --inductor --raise --training --float16
...
Traceback (most recent call last):
File "/home/jansel/conda/envs/pytorch/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/jansel/pytorch/torch/_inductor/codecache.py", line 239, in _worker_compile
kernel = TritonCodeCache.load(source_code)
File "/home/jansel/pytorch/torch/_inductor/codecache.py", line 234, in load
mod = PyCodeCache.load(source_code)
File "/home/jansel/pytorch/torch/_inductor/codecache.py", line 212, in load
exec(code, mod.__dict__, mod.__dict__)
File "/tmp/torchinductor_jansel/ij/cij7smji4sw2a56i4yz45bjkrosd2sb2raqnxzsxxpg4kwzuo2ta.py", line 5, in <module>
from torch._inductor.triton_ops.autotune import reduction
File "/home/jansel/pytorch/torch/_inductor/triton_ops/__init__.py", line 3, in <module>
if has_triton():
File "/home/jansel/pytorch/torch/_inductor/utils.py", line 38, in has_triton
return triton is not None and torch.cuda.get_device_capability() >= (7, 0)
File "/home/jansel/pytorch/torch/cuda/__init__.py", line 368, in get_device_capability
prop = get_device_properties(device)
File "/home/jansel/pytorch/torch/cuda/__init__.py", line 382, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/home/jansel/pytorch/torch/cuda/__init__.py", line 228, in _lazy_init
raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
```
cc @zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87048
Approved by: https://github.com/soumith