[Functionalization] Slightly improve detach_copy (#4814)
Summary:
Somehow the current detach_copy has increased the memory usage of GPT-2 with FSDP a lot, see #4813. We may not implement it correctly. This fix won't fix the memory overhead as well.
Test Plan:
CI.