pytorch
648202ce - Improve DDPOptimizer by avoiding small preamble graph (#93162)

Commit

1 year ago

Improve DDPOptimizer by avoiding small preamble graph (#93162) This optimizes an edge case where some compute-only ops (e.g. add) could end up in an orphan graph at the input side due to the bucket for the next graph being full already. The fix is to fuse this graph (which is "empty" in parameter count) together with the adjoining "full" bucket. Note: i encountered this when trying to repro some suspected duplicate argument errors, but this is unrelated and I have not yet repro'd a duplicate arg issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93162 Approved by: https://github.com/davidberard98

Author

wconstab

Committer

pytorchmergebot

Parents

f40183d3

pytorch 648202ce - Improve DDPOptimizer by avoiding small preamble graph (#93162)

pytorch
648202ce - Improve DDPOptimizer by avoiding small preamble graph (#93162)