[CUDA] Fix clip operator (#25057)
### Description
The result of Clip op is not expected when min > max for CUDA. This
fixes the implementation to align with operator spec:
https://onnx.ai/onnx/operators/onnx__Clip.html
### Motivation and Context
ONNX backend test failure with onnx 1.18:
python onnx_backend_test_series.py
======================================================================
FAIL: test_clip_min_greater_than_max_cuda
(__main__.OnnxBackendNodeModelTest)
----------------------------------------------------------------------
DESIRED: array([1., 1., 1.], dtype=float32)
ACTUAL: array([2., 2., 1.], dtype=float32)