Avoid one unnecessary memory allocation in XNNPACK integration. (#35350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35350
Currently we call input.contiguous() on the input tensor resulting in an
unecessary allocation and copy in cases where the input is not contiguous
with regards to the requested memory format. The reason is that in such
scenarios, this call re-allocates and copies the input tensor into
contiguous storage, only for this newly allocated tensor to be used as
the source of another copy to the final destination. Instead, if we copy
into the destination directly in such circumstances, we will save an
allocation and a copy.
Differential Revision: D20656798
Test Plan: Imported from OSS
Pulled By: AshkanAliabadi
fbshipit-source-id: 3f8c51df4d1fd386fa9473e7024621a7b7c6e86c