[BUILD] Fix Build Errors and Warnings in CUDA Providers (#27276)
## Description
User reported build error in
https://github.com/microsoft/onnxruntime/issues/27269.
This PR addresses several build issues and compilation warnings in the
CUDA provider and associated contrib ops. These fixes ensure a clean
build and improved compatibility with different CUDA versions
(specifically CUDA 13.1) and compilers.
## Changes
### 1. Fix ShardedMoE Compilation Error
- Resolved a "no matching function for call to CheckInputs" error in
sharded_moe.cc
- Updated the `moe_helper::CheckInputs` call to provide the required
`zero_points` arguments (passing `nullptr`), aligning with the updated
function signature.
### 2. Suppress CUDA 13.1 System Header Warnings
- Added GCC/Clang diagnostic pragmas to suppress `-Wunused-parameter`
warnings in `cuda_fp4.h`.
- These warnings were causing build failures in environments where
warnings are treated as errors.
- Affected files:
- onnxruntime/core/providers/cuda/cuda_common.h
- onnxruntime/core/providers/cuda/cuda_type_conversion.h
- onnxruntime/contrib_ops/cuda/llm/cutlass_type_conversion.h
### 3. Resolve Sign-Comparison Warnings
- Fixed several `-Wsign-compare` warnings that were being treated as
errors:
- **Pad Op:** Changed loop variable type to `size_t` in
onnxruntime/core/providers/cuda/tensor/pad.cc.
- **Distributed Reshape:** Added explicit casts to `size_t` for
`int64_t` comparisons in
onnxruntime/contrib_ops/cuda/collective/distributed_reshape.cc.
## Verification
- The build now completes successfully without errors or warnings using
`--cmake_extra_defines onnxruntime_USE_NCCL=ON`
- Builds tested with cuda 12.8, 13.0 and 13.1.1