[FSDP] Set `NCCL_DESYNC_DEBUG=0` for FSDP unit tests (#99916)
This should fix https://github.com/pytorch/pytorch/issues/99011.
With `NCCL_DESYNC_DEBUG=0`, we can run 100 iterations of
```
CUDA_LAUNCH_BLOCKING=1 NCCL_DESYNC_DEBUG=1 CUDA_VISIBLE_DEVICES=0,7 numactl -C 2 python test/distributed/fsdp/test_fsdp_core.py -v -k test_transformer_no_grad --repeat 100 2>&1 | tee out
```
without erroring, whereas with `NCCL_DESYNC_DEBUG=1`, we can repro the error with high failure rate (usually <10 iterations).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99916
Approved by: https://github.com/zhaojuanmao