Reduce Python and Nuget GPU package size (#26002)
### Description
The package size limit for PyPI and Nuget are:
- python package size under 300MB
- Nuget package size under 250MB
To meet the size limit,
this PR firstly removes some old GPU arch support in
CMAKE_CUDA_ARCHITECTURE.
Secondly, it removes the FPA_INTB_GEMM support in Linux Python wheel.
#### Python wheel
| OS | cmake_cuda_architecture | CUDA kernel removal |Package size |
Under 300MB|
|---------|--------------------------------------------------------|-|-------------|---|
| Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual |
|341 MB |No (original)|
| Linux | 70-real;75-real;80-real;86-real;90a-real;90a-virtual | | 329
MB |No|
| Linux | 75-real;80-real;86-real;90a-real;90a-virtual | |319 MB |No|
| Linux | 80-real;86-real;90a-real;90a-virtual | |304 MB |No|
| Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual.
| FPA_INTB_GEMM|287 MB |Yes|
| Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual | | 272
MB |Yes (original)|
#### Nuget
| OS | cmake_cuda_architecture | CUDA kernel removal |Package size
|Under 250MB|
|---------|--------------------------------------------------------|---|--------------|---|
| Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual | |276 MB
|No (original)|
| Linux | 75-real;80-real;90a-real;90a-virtual | |253 MB |No|
| Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual
|FPA_INTB_GEMM| 230 MB |Yes|
| Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual || 264
MB |No (original)|
| Windows | 61-real;75-real;86-real;89-real;90a-virtual || 254 MB |No|
| Windows | 75-real;86-real;89-real;90a-virtual || 242 MB |Yes|
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->