SemanticDiff

pytorch
2c61313f - [inductor] Parallelize Max Autotune step 2: Use all GPUs (#107983)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

1 year ago

[inductor] Parallelize Max Autotune step 2: Use all GPUs (#107983) Summary: Step 2 in revamping subprocess autotune to support multiple GPUs: use a pool of subprocesses and distribute benchmark calls across them. Test Plan: `python test/inductor/test_max_autotune.py` `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` `TORCHINDUCTOR_AUTOTUNE_MULTI_DEVICE=1 TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107983 Approved by: https://github.com/eellison, https://github.com/shunting314 ghstack dependencies: #107982

Author

masnesral

masnesral

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading