TensorIterator: Reduce serial_for_each static overhead (#58909)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58909
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D28776507
Pulled By: ngimel
fbshipit-source-id: 4f0283d03b26aa5785b687b78d77e6b0efcbaf65