Remove `__nv_relfatbin` section from nccl_static library (#35843)
Summary:
NCCL library is built using [CUDA separate compilation](https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/), which consists of building intermediate CUDA binaries and then linking them into GPU code that could be executed on device. Intermediate CUDA code is stored in `__nv_relfatbin` section, and code that can be launched is stored in `.nv_fatbin`. When `nvcc` is used to link executable/shared library, it removes those intermediate binaries, but default host linker is not aware of that and therefore it is kept inside host executable. Help compiler by removing `__nv_relfatbin` sections from object file inside `libncc_static.a`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35843
Test Plan: Build pytorch with CUDA and run `test_distributed.py`
Differential Revision: D20882224
Pulled By: malfet
fbshipit-source-id: f23dd4aa416518324cb38b9bd6846e73a1c7dd21