[CUDA] Support volumetric (3-D) grid sampling in the CUDA GridSample operator (#27201)
### Description
1. Supports volumetric input grid sampling in the CUDA EP `GridSample`
operator (i.e.) 5-D input tensor a.k.a 3-D spatial data
2. Registers the CUDA `GridSample` operator for opsets 20 and 22
3. Supports both NCHW and NHWC layouts for volumetric inputs
4. Does not support `cubic` mode for volumetric inputs for now and this
is consistent with the CPU version of the implementation and hence will
not cause "functional regression" (i.e.) `cubic` mode for 3-D spatial
data is not supported on CPU and CUDA before and after this change. This
is a TODO for the future.
5. There are enough unit tests in `grid_sample_test.cc` to cover the
volumetric input case and this is run in both NCHW (NCDHW for volumetric
case) and NHWC (NDHWC for volumetric case) layouts for the CUDA EP
### Motivation and Context
Resolve https://github.com/microsoft/onnxruntime/issues/21382
Resolve https://github.com/microsoft/onnxruntime/issues/18942
Resolve https://github.com/microsoft/onnxruntime/issues/16581
Resolve https://github.com/microsoft/onnxruntime/issues/18313
Related CPU PRs (for opset 20 and opset 22):
https://github.com/microsoft/onnxruntime/pull/17744 &&
https://github.com/microsoft/onnxruntime/pull/23344