Replace calls to contiguous with contiguous(suggested memory format) (#38433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38433
Wherever applicable it would be better to call contiguous with appropriate
memory format.
Plus output should be allocated with the same memory format as input when
applicable. Otherwise convert to that format upon returning.
This helps with some perf where otherwise calls to contiguous may involve
allocation and memcpy.
Test Plan: quantization tests
Reviewed By: vkuzo
Differential Revision: D21559301
fbshipit-source-id: 2ed5de05fb627eef1bf5d76fba0387ba67370007