Replace flatten tensors with flatten loops. (#46539)
Summary:
This diff changes `TensorExprKernel::generateStmt` to use flatten loops instead of flatten tensors.
Checked all tests on CPU as well as CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46539
Reviewed By: nickgg
Differential Revision: D24395956
Pulled By: navahgar
fbshipit-source-id: f3792903f2069bda37b571c9f0a840e6fb02f189