pytorch
2aade49a - [PyTorch Distributed] Consolidate NCCL_DESYNC_DEBUG and TORCH_DISTRIBUTED_DEBUG=INFO (#73257)

Commit
3 years ago
[PyTorch Distributed] Consolidate NCCL_DESYNC_DEBUG and TORCH_DISTRIBUTED_DEBUG=INFO (#73257) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73257 Infer desync debug from whether TORCH_DISTRIBUTED_DEBUG >= INFO Test Plan: 1. When TORCH_DISTRIBUTED_DEBUG=INFO: 1.1 Catch mismatched collectives (e.g. broadcast vs reduce) - passed 1.2 Catch mismatched collective sizes - passed 2. QPS test: no performance regression - passed Reviewed By: rohan-varma Differential Revision: D34232827 fbshipit-source-id: 9cc71a8ab0d416a2037daca08930e590688e1d38 (cherry picked from commit 0322c80560736e173a5868e7077171a410116888)
Author
Committer
Parents
Loading