Decouple direct access to native::scalar_tensor from TensorIndexing.h (#48761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48761
Targeting one of the items in https://github.com/pytorch/pytorch/issues/48684. For performance purpose we don't use at::scalar_tensor. Since scalar_tensor_static is available for CPU we could use it at least for CPU. One uncertainty is the CUDA performance. But there's no fast path for CUDA under native::scalar_tensor either, I assume the perf on CUDA may not be affected.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D25410975
Pulled By: iseeyuan
fbshipit-source-id: 160d21ffeefc9a2e8f00a55043144eebcada2aac