Reimplement torch::flip based on advanced indexing (#56713)
Summary:
## Rationale
This PR improves the performance of `torch::flip` by using `TensorIterator` as the same fashion as using `AdvancedIndexing`. Which means that this implementation is semantically equivalent to indexing a tensor using reverse indices `A[dim0_size - 1:0 ..., dimN_size-1:0, ...]`.
## Benchmark results
The following benchmark compares the runtime of this implementation of `flip` against the current implementation, AdvancedIndexing with reversed indices, as well as OpenCV one. The comparison scenarios consider a 4D tensor `[B, C, H, W]`, where the dimensions flipped correspond to `H` (vertical flip) and `W` (horizontal flip) under float32 and uint8 datatypes.
The benchmark implementation details can be found in https://github.com/andfoy/flip-benchmarks/blob/main/5_Stable_implementation/benchmarks.py. Additionally, there are correctness tests against the current flip implementation in https://github.com/andfoy/flip-benchmarks/blob/main/5_Stable_implementation/main.cpp, which tests against different layouts, datatypes and contiguous/non-contiguous tensors.
The following plots correspond to the means of the runtime of each operator after 100 samples. As it is possible to observe, the latest implementation of flip has a runtime similar to the indexing one. Also, the performance gains are up to 6X under some scenarios.
### Horizontal flip (float)
![bokeh_plot](https://user-images.githubusercontent.com/1878982/115766715-e72a3d80-a36d-11eb-8552-9005028900b1.png)
### Horizontal flip (uint8)
![bokeh_plot(1)](https://user-images.githubusercontent.com/1878982/115766720-e7c2d400-a36d-11eb-822d-44046882c976.png)
### Vertical flip (float)
![bokeh_plot(2)](https://user-images.githubusercontent.com/1878982/115766721-e7c2d400-a36d-11eb-8f4b-d44c8c33d104.png)
### Vertical flip (uint8)
![bokeh_plot(3)](https://user-images.githubusercontent.com/1878982/115766725-e85b6a80-a36d-11eb-907a-cfcddba555ad.png)
cc fmassa vfdev-5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56713
Reviewed By: datumbox
Differential Revision: D28255088
Pulled By: fmassa
fbshipit-source-id: 5b8684812357c331e83a677b99cf0d78f0821678