Improvements for DDP Optimizer (#87525)
- adds support for 'first_bucket_cap' arg, to align bucketing more precisely
with DDP, which may start a smaller first bucket
- refactors the bucket splitting logic to be cleaner
- adds pretty-print for bucket info, and a way to access bucket info
from the DDPOptimizer class from a test case or benchmark
- dumps debug logs to stdout
cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87525
Approved by: https://github.com/davidberard98