[PyTorch] Make c10::irange(x) generate the same assembly as for loop (#86841)
`c10::irange(n)` generated an extra `sar` and `andn` instruction compared to a traditional `for` loop. now it doesn't.
Differential Revision: [D40321009](https://our.internmc.facebook.com/intern/diff/D40321009/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86841
Approved by: https://github.com/r-barnes, https://github.com/malfet