[nnc] Update cuda codegen to use llvm for thread and block extent computations (#72040)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72040
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D33865041
Pulled By: eellison
fbshipit-source-id: 41b4e648f69a048c7d84410da0f082ec3916f4f9
(cherry picked from commit 6be040e5fe3bb2ab6c0d3226a4c1def9f8a0730d)