fix nccl compilation to make sure it compiles for architectures that pytorch compiles for (#18739)
Summary:
resubmit of https://github.com/pytorch/pytorch/pull/18704 with additional fixes
Fixes https://github.com/pytorch/pytorch/issues/18359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18739
Differential Revision: D14737274
Pulled By: soumith
fbshipit-source-id: cfbbbf68b098594bd045861d1b2c085da693ea51