[Build] update cuda 13 package: fatbin compress mode and cuda archs (#26516)
### Changes
Update cuda 13 python packaging pipeline:
(1) Use fatbin compress mode = size to reduce package size. This could
significantly reduce package size.
(2) Update CMAKE_CUDA_ARCHITECTURES for cuda 13. Since we reduced
package size, we are able to add more architectures.
(3) Fix cuda 13 packaging pipeline:
- use correct (cuda13 instead of cuda12) manylinux docker. The new linxu
docker has cuda 13.0.2 and cuDNN 9.14.
- pass cuda version properly to run build_linux_python_package.sh in
docker. (CUDA_VERSION in docker was 12.8.1, and now we pass "12.8" from
yml to be consistent).
Note that the compress mode and cuda archs settings are not changed for
CUDA 12.8, so cuda 12 wheel size is larger than cuda 13 wheel size. We
can update them in a separated PR if needed.
The nuget pipeline for cuda 13 need extra code change, and this PR only
fixes python packaging pipeline.
### Python GPU Wheel Size (Cuda Architectures + PTX)
CUDA | Windows | Linux
----|---|---
12.8 | 221 MB (52;61;75;86;89+90) | 271 MB (60;70;75;80;86;90a+90)
13.0 | 186 MB (75;80;86;89;90a;100a;120a+120) | 191 MB
(75;80;86;89;90a;100a;120a+120)