Add size info to collective logs (#100413)
Previous timeout log does not print size info. Making it hard to debug hang caused by message size mismatch.
(Reason is that when copying `WorkNCCL` object during work enqueue, we don't copy `outputs_` due to reference concern, hence `output.size()` is never triggered.)
This PR logs sizes using separate fields, hence not relying on `outputs_`.
New timeout log:
```
[Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=_ALLGATHER_BASE, NumelIn=209715200, NumelOut=1677721600, Timeout(ms)=10000) ran for 10957 milliseconds before timing out.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100413
Approved by: https://github.com/kumpera