[CUDA] replace 90a-virtual by 90-virtual for forward compatible (#26230)
Users with RTX 5090 GPUs are experiencing runtime errors when using
onnxruntime-gpu:
```
[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node.
Name:'Slice_34' Status Message: CUDA error cudaErrorNoKernelImageForDevice:
no kernel image is available for execution on the device
```
This occurs because RTX 5090 uses CUDA compute architecture 12.0 (SM
12.0). The incompatibility of `onnxruntime-gpu` 1.23 was built with
`90a-virtual`. The `90a` architecture is a specialized,
non-forward-compatible version of the Hopper architecture, making it
incompatible with future GPU generations like Blackwell.
This change will revert `90a-virtual` back to `90-virtual` as used in
1.22. This shall bring back the compatibility in Blackwell GPU.
The FPA_INTB_GEMM is disabled by default. It need some extra work to
make it compatible with 90-virtual and no 90a-real use case.
Related:
https://github.com/microsoft/onnxruntime/pull/26002
https://github.com/microsoft/onnxruntime/pull/26226
https://github.com/microsoft/onnxruntime/issues/26181