NCCL: Re-enable parallel builds (#83696)
Since #83173 was merged I have noticed some CI being slowed down by
the nccl building step. e.g. if there are no C++ changes then sccache
compiles everything else very quickly and nccl becomes the limiting
factor.
This re-enables parallel builds with some safeguards to protect
against oversubscription. When `make` is the parent build system, we
can use `$(MAKE)` and the `make` jobserver will coordinate job
allocation with the sub-process. For other build systems, this calls
`make` with the `-l` flag which should prevent it launching jobs when
the system load average is already too high.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83696
Approved by: https://github.com/malfet