cache tensor scalar_type in OperandInfo (#30065)
Summary:
Caches result of `scalar_type` call in TensorIterator and TensorOptions, because the call is expensive.
This removes 120 - 150 ns of overhead (from 1.25 us to 1.12 us for out-of-place case, from 0.86 us to 0.73 us for inplace case)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30065
Test Plan: Covered by existing tests
Differential Revision: D18576236
Pulled By: ngimel
fbshipit-source-id: 17f63851a911fc572c2146f8a520b7f0dadfd14a