[inductor] Fix a cpp_wrapper issue when fx_passes modified fx graph (#102851)
Summary: Currently cpp_wrapper for CUDA does it in two passe, which
means we need to deepcopy the input module to isolate any fx
transformations between the two passes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102851
Approved by: https://github.com/jansel