[CUDA] Fix build for sm<53 (#24582)
### Description
There is some build error for `--cmake_extra_defines
CMAKE_CUDA_ARCHITECTURES=52`.
Some half2 function like `__hfma2` used in MatMul 8 bits is not defined
for sm < 53. Add an implementation that does not use half2 for those old
GPUs.
Fix another build error using cuda 12.5 that is caused by extra `const`
in MOE code for sm<53.
### Motivation and Context
Fix nuget packaging pipeline, which uses
`CMAKE_CUDA_ARCHITECTURES=52-real;61-real;75-real;86-real;89-real;90-virtual`.