SemanticDiff

pytorch
5d81ade4 - [Inductor max autotune] Multithreaded Precompilation (#119386)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

227 days ago

[Inductor max autotune] Multithreaded Precompilation (#119386) When using the Cutlass backend, the compilation of CUDA source files can totally dominate the runtime required for the benchmarking done as part of Autotuning. This change adds a multithreaded precompilation phase, which serves to pre-populate the compilation cache ( both in-memory, and a possible on-disk sccache ). Also it ensures that no unneccessary compilation and benchmarking steps are performed, which was peviously the case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119386 Approved by: https://github.com/aakhundov

Author

kadeng

kadeng

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading