DeepSpeed
ecb26a51 - Fix DeepCompile+Z3 on PyTorch v2.9/2.10 (#7951)

Commit
13 days ago
Fix DeepCompile+Z3 on PyTorch v2.9/2.10 (#7951) DeepCompile+Z3 didn't work with PyTorch v2.9/2.10 because: - PyTorch v2.9+ started enforcing stricter TorchDynamo parameter tensor-match guards. During DeepCompile tracing, some ZeRO-3 parameters were temporarily all-gathered, so Dynamo recorded full sizes such as 4096 - By the time guard evaluation ran, DeepSpeed had already released those params back to the normal ZeRO-3 partitioned representation, where `param.data` is `empty(0)`. That produced guard failures like `expected 4096, actual 0`. This PR resolves the issue by: - Leep full-shape dummy tensors for symbolic tracing - Override guard size/stride metadata for ZeRO-3 params to the stable released representation instead of transient gathered sizes This PR also includes fixes of these bugs: - For v2.7 and v2.8, the compiled backward graph could hoist `end_backward` ahead of the real `reduce_grad` calls. - Selective unsharding pass can overcount the persistence memory budget. Note: DeepCompile is still incompatible with v2.11. It will be addressed by another PR. --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Author
Parents
Loading