Fix dynamo issue (#6527)
Dynamo use faketensor to trace tensor ops. In some case, the mechanism
break compiling with deepspeed.
An example could be found at
https://gist.github.com/oraluben/9b8240c2fe482eb4382453d6c97a5f76, to
see issues, install deepspeed==0.14.4 instead of my fork
without this PR, llama cannot be compiled.
Detailed explanation:
1. `ZeROOrderedDict`
dynamo use deepcopy to copy tensors, which will call
`object.__reduce__`. When copying `ZeROOrderedDict`, the default
implementation do not copy its `_parent_module` and will lead to
failure.
2. `param` maybe faketensor and do not have `ds_status` yet, but during
tracing it's ok to just skip the `register_external_parameter`, it
should be done ways before.
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>