Replace flatten tensors with flatten loops. (#46737)
Summary:
This is the second attempt at replacing flatten tensors with flatten loops in `TensorExprKernel::generateStmt`. The first attempt (https://github.com/pytorch/pytorch/pull/46539) resulted in a build failure due to an exception that gets thrown during inline.
The reason for the build failure was because there was an inline step, which was supposed to happen on the unflattened tensors. This was necessary earlier because for every flattened tensor there was an unflattened tensor which had to be inlined. That is no longer necessary since we do not have 2 tensors (flattened and unflattened) now. Removed this inline.
Checked python and cpp tests on CPU as well as CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46737
Reviewed By: anjali411, izdeby
Differential Revision: D24534529
Pulled By: navahgar
fbshipit-source-id: 8b131a6be076fe94ed369550d9f54d3879fdfefd