Add copy out to the fallback path in SR invocation of composed op (#70871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70871
We had previously handled reusing memory in the optimized kernel execution path, but not yet handled it if we hit the unoptimized fallback.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33458652
Pulled By: eellison
fbshipit-source-id: 4eb62181ed02c95813a99638f5e2d0f9347b5c08