[FSDP][Docs] Per-device NCCL stream is per PG (#95705)
https://github.com/pytorch/pytorch/blob/71ad1005f66c9a53a2fe28d24b95c4e828aa944e/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp#L647-L649
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95705
Approved by: https://github.com/fegin