Compress fatbin to fit into 32bit indexing (#43074)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39968
tested with `TORCH_CUDA_ARCH_LIST='3.5 5.2 6.0 6.1 7.0 7.5 8.0+PTX'`, before this PR, it was failing, and with this PR, the build succeed.
With `TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0+PTX'`, `libtorch_cuda.so` with symbols changes from 2.9GB -> 2.2GB
cc: ptrblck mcarilli jjsjann123
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43074
Reviewed By: mrshenli
Differential Revision: D23176095
Pulled By: malfet
fbshipit-source-id: 7b3e6d049fc080e519f21e80df05ef68e7bea57e