pytorch
648202ce - Improve DDPOptimizer by avoiding small preamble graph (#93162)

Commit
1 year ago
Improve DDPOptimizer by avoiding small preamble graph (#93162) This optimizes an edge case where some compute-only ops (e.g. add) could end up in an orphan graph at the input side due to the bucket for the next graph being full already. The fix is to fuse this graph (which is "empty" in parameter count) together with the adjoining "full" bucket. Note: i encountered this when trying to repro some suspected duplicate argument errors, but this is unrelated and I have not yet repro'd a duplicate arg issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93162 Approved by: https://github.com/davidberard98
Author
Committer
Parents
Loading