Bring fast_nvcc.py to PyTorch OSS (#48934)
Summary:
This PR adds `tools/fast_nvcc/fast_nvcc.py`, a mostly-transparent wrapper over `nvcc` that parallelizes compilation of CUDA files when building for multiple architectures at once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48934
Test Plan: Currently this script isn't actually used in PyTorch OSS. Coming soon!
Reviewed By: walterddr
Differential Revision: D25286030
Pulled By: samestep
fbshipit-source-id: 971a404cf57f5694dea899a27338520d25191706