adding cuda 12 ort training for stable and on-device scenarios (#19439)
### Description
<!-- Describe your changes. -->
adding cuda 12 ort training for stable and on-device scenarios. Also,
there was a removal for CUDA 12 in this
[PR](https://github.com/microsoft/onnxruntime/pull/19342), fixing it.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->