Bump maxinum num warps (#132458)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/129104
Our heuristic for num_warps was giving the optimal number, but we were capping maximum num_warps at 8. Gives 1% speedup on HF and TIMM in inference, 2% speedup in TIMM training, neutral otherwise.
ultimately, I think we want live var analysis for register usage.. still worth landing this now.
X-link: https://github.com/pytorch/pytorch/pull/132458
Approved by: https://github.com/Chillee, https://github.com/shunting314
Reviewed By: jovianjaison
Differential Revision: D61308271
Pulled By: eellison
fbshipit-source-id: 3ceafd3701ab712693abfdd1ebe40aed845d3e6f