Add RotaryEmbeddings(23) - CUDA (#25178)
Follow up #24980
Fix https://github.com/microsoft/onnxruntime/issues/24556
Add ONNX RotaryEmbedding(23) following
https://github.com/onnx/onnx/blob/main/docs/Operators.md#RotaryEmbedding.
The PR uses contrib op RotaryEmbedding implementation under the hood.
The main difference between this op and the contrib op is that the
position_ids in ONNX RotaryEmbedding is optional. When it's not
provided, cos_cache and sin_cache should be 3d.