Make NCCL default logging more friendly. (#105695)
Default behavior for a python library should be to not print anything that's not error/warning. However today any 8GPU tasks will by default print these logs that take more than a whole screen. This is especially heavily affecting user-experience for small workloads that don't print much themselves:
```
I0719 10:50:33.485718 219407 ProcessGroupNCCL.cpp:482] [Rank 3] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
NCCL_DEBUG: WARN
I0719 10:50:33.485716 219402 ProcessGroupNCCL.cpp:482] [Rank 1] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
NCCL_DEBUG: WARN
I0719 10:50:33.485841 220673 ProcessGroupNCCL.cpp:581] [Rank 1] NCCL watchdog thread started!
I0719 10:50:33.485882 220672 ProcessGroupNCCL.cpp:581] [Rank 3] NCCL watchdog thread started!
I0719 105033.485 distributed_c10d.py:213] Added key: store_based_barrier_key:1 to store for rank: 3
I0719 105033.485 distributed_c10d.py:213] Added key: store_based_barrier_key:1 to store for rank: 1
I0719 10:50:33.559300 219400 ProcessGroupNCCL.cpp:482] [Rank 0] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
NCCL_DEBUG: WARN
I0719 10:50:33.559444 220675 ProcessGroupNCCL.cpp:581] [Rank 0] NCCL watchdog thread started!
I0719 105033.559 distributed_c10d.py:213] Added key: store_based_barrier_key:1 to store for rank: 0
I0719 10:50:33.577245 219415 ProcessGroupNCCL.cpp:482] [Rank 4] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
NCCL_DEBUG: WARN
I0719 10:50:33.577381 220676 ProcessGroupNCCL.cpp:581] [Rank 4] NCCL watchdog thread started!
I0719 105033.577 distributed_c10d.py:213] Added key: store_based_barrier_key:1 to store for rank: 4
I0719 10:50:33.583372 219404 ProcessGroupNCCL.cpp:482] [Rank 2] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
NCCL_DEBUG: WARN
I0719 10:50:33.583511 220677 ProcessGroupNCCL.cpp:581] [Rank 2] NCCL watchdog thread started!
I0719 105033.583 distributed_c10d.py:213] Added key: store_based_barrier_key:1 to store for rank: 2
I0719 10:50:33.672052 219421 ProcessGroupNCCL.cpp:482] [Rank 5] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
NCCL_DEBUG: WARN
I0719 10:50:33.672153 220684 ProcessGroupNCCL.cpp:581] [Rank 5] NCCL watchdog thread started!
I0719 105033.672 distributed_c10d.py:213] Added key: store_based_barrier_key:1 to store for rank: 5
I0719 10:50:33.844262 219427 ProcessGroupNCCL.cpp:482] [Rank 6] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
NCCL_DEBUG: WARN
I0719 10:50:33.844411 220687 ProcessGroupNCCL.cpp:581] [Rank 6] NCCL watchdog thread started!
I0719 105033.844 distributed_c10d.py:213] Added key: store_based_barrier_key:1 to store for rank: 6
I0719 10:50:33.853435 219432 ProcessGroupNCCL.cpp:482] [Rank 7] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
NCCL_DEBUG: WARN
I0719 10:50:33.853551 220688 ProcessGroupNCCL.cpp:581] [Rank 7] NCCL watchdog thread started!
I0719 105033.854 distributed_c10d.py:213] Added key: store_based_barrier_key:1 to store for rank: 7
I0719 105033.854 distributed_c10d.py:247] Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
I0719 105033.854 distributed_c10d.py:247] Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
```
This PR changes the NCCL init logs from multi-line to a shorter one-line format. And changes the watchdog logs from LOG(INFO) to VLOG so it can be enabled on-demand.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105695
Approved by: https://github.com/fduwjj