[nccl] Remove lock for nccl collective launch for 2.0+ (#97904)
Summary: It looks nccl 2.0+ no longer needs a lock to avoid being called concurrently with cudaFree.
Test Plan: sandcastle + OSS CI
Differential Revision: D44514446
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97904
Approved by: https://github.com/malfet, https://github.com/kwen2501