pytorch
bdbfc258 - [Dist Debugality] Log key DDP metrics to stderr under debug mode. (#52957)

Commit
3 years ago
[Dist Debugality] Log key DDP metrics to stderr under debug mode. (#52957) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52957 This diff: 1. Under TORCH_DISTRIBUTED_DEBUG=INFO or DETAIL, logs DDP information during init time (all stats in ddp_logging_data_) 2. Under TORCH_DISTRIBUTED_DEBUG=DETAIL, logs runtime stats when they are collected (first 10 iterations and then once every 100 iterations). Avoiding logging every iteration to not spam logs. Verified by inspecting logs: ``` I0226 19:12:47.109243 2818475 logger.cpp:140] [Rank 1]: DDP Initialized with: world_size: 2 module_name: Linear device_ids: 1 output_device: 1 backend_name: nccl parameter_dtype: float total _parameter_size_in_bytes: 40 num_parameter_tensors: 2 bucket_sizes: 40 CUDA_VISIBLE_DEVICES: N/Abroadcast_buffer s: 1 bucket_cap_mb: 25 find_unused_parameters: 0 gradient_as_bucket_view: 0 Backend Info: nccl_socket_ifname: N/A nccl_blocking_wait: N/A nccl_debug: WARN nccl_nthreads: N/A nccl_ib_timeo ut: N/A I0226 19:12:47.109252 2818473 logger.cpp:140] [Rank 0]: DDP Initialized with: world_size: 2 module_name: Linear device_ids: 0 output_device: 0 backend_name: nccl parameter_dtype: float total _parameter_size_in_bytes: 40 num_parameter_tensors: 2 bucket_sizes: 40 CUDA_VISIBLE_DEVICES: N/Abroadcast_buffer s: 1 bucket_cap_mb: 25 find_unused_parameters: 0 gradient_as_bucket_view: 0 Backend Info: nccl_socket_ifname: N/A nccl_blocking_wait: N/A nccl_debug: WARN nccl_nthreads: N/A nccl_ib_timeo ut: N/A ``` ``` I0226 19:12:48.117936 2818473 logger.cpp:286] [Rank 0 / 2] Training Linear unused_parameter_size=0 Avg forward compute time: 568944 Avg backward compute time: 885504 Avg backward comm. time: 692496 Avg backward comm/comp overlap time: 113536 I0226 19:12:48.118517 2818475 logger.cpp:286] [Rank 1 / 2] Training Linear unused_parameter_size=0 Avg forward compute time: 565584 Avg backward compute time: 876992 Avg backward comm. time: 201872 Avg backward comm/comp overlap time: 128624 ``` ghstack-source-id: 123171875 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26708184 fbshipit-source-id: 16defd5610d28bc4cf3fc2a0cc564e84efcfa791
Author
Parents
Loading