[PT2D] Make the speedup benchmark works with DDP + CompiledAutograd (#120454)
With DDP + CompiledAutograd, we could not use the same parallelized model to do the test. This PR copies the model.
Differential Revision: [D54094257](https://our.internmc.facebook.com/intern/diff/D54094257/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120454
Approved by: https://github.com/yf225, https://github.com/xmfan