ORT 1.23.1 cherrypick 1 [REDO] (#26140)
### Description
Cherry-pick the following PRs into the ORT 1.23.1 branch:
- Fix Attention GQA implementation on CPU
- **MANUAL MERGE**: see
https://github.com/microsoft/onnxruntime/pull/26057
- main merge date: Sept 15, 11:33am
- pr: https://github.com/microsoft/onnxruntime/pull/25966
- commit: d530b290647ddaa7a7645accb858febf744d5c20
- Address edge GetMemInfo edge cases
- main merge date: Sept 16, 10:32am
- pr: https://github.com/microsoft/onnxruntime/pull/26021
- commit: d251f3a443b6e6afe78c1353b0a1d4769f740e4b
- Implement new Python APIs
- main merge date: Sept 17, 11:44am
- pr: https://github.com/microsoft/onnxruntime/pull/25999
- commit: abc63e8f705f59340a5647cd7193cbea63f5c29a
- MemcpyFromHost and MemcpyToHost support for plugin EPs
- **MERGE CONFLICT** on file
onnxruntime/test/optimizer/transpose_optimizer_test.cc. Conflicts with
https://github.com/microsoft/onnxruntime/pull/25689
- main merge date: Sept 23, 10:42am
- pr: https://github.com/microsoft/onnxruntime/pull/26088
- commit: 45457323a31f40022edddc778872e265702e3531
- [TRT RTX EP] Fix bug for generating the correct subgraph in
GetCapability #26132
- main merge date: Sept 23, 8:54pm
- pr: https://github.com/microsoft/onnxruntime/pull/26132
- commit: 72e56e76dbc45013cda07eda2e827942cc9e6679
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>