[iOS GPU][Perf][1/n] Use aten::contiguous instead of permuting weights manually (#57664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57664
Manually permuting weights is slower than using aten::contiguous. This will improve the model loading time at runtime especially on low-end devices. Some numbers from the Unet model. Average 6x faster.
- iPhone 12
- before - 26.252 ms
- after - 4.727 ms
- iPhone 11
- before - 29.638 ms
- after - 5.012 ms
- iPhone X
- before - 33.257 ms
- after - 5.481 ms
- iPhone 8
- before - 33.335 ms
- after - 5.83 ms
- iPhone 7
- before - 36.144 ms
- after - 6.232 ms
- iPhone 6s
- before - 47.977 ms
- after - 6.998 ms
ghstack-source-id: 128338534
Test Plan: - CI
Reviewed By: kimishpatel
Differential Revision: D28087911
fbshipit-source-id: ad0029436e59a0ecc02ce660ed1110dc0b82848c