[ONNX] Don't duplicate model weights in ONNX export (#101134)
This commit partially fixes an issue where the ONNX exporter always requires about 2x memory than the model size. The `ONNXTracedModule` class uses a copy of the original weights only when `return_inputs=True`, so this commit makes sure the weights are cloned only in that case.
As a side note, I don't think the exporter is ever called with `return_inputs=True`, so maybe this is just some old code that can be removed.
Partially fixes #61263. There are still other places in the exporter which use more memory than they need to. For example, during the shape inference step many intermediate tensors are computed and saved until shape inference on the model is complete. I am working on a fix for that, but that optimization is independent of this one and can be done in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101134
Approved by: https://github.com/BowenBao, https://github.com/osalpekar