[inductor] Use torch.cuda.clock_rate instead of triton.testing.nvsmi (#118662)
`triton.testing.nvsmi` invokes `nvidia-smi` as a subprocess, and Meta
prod usually doesn't make nvidia-smi available. Might as well just use
something that's native to torch.
Differential Revision: [D53235814](https://our.internmc.facebook.com/intern/diff/D53235814/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118662
Approved by: https://github.com/jansel