Fix GatherCopyData Integer Truncation Leading to Heap Out-of-Bounds Read/Write (#27444)
### Description
This pull request improves the robustness and correctness of the CPU
implementation of the Gather operator in ONNX Runtime. The key changes
focus on preventing integer overflow issues in parallel processing and
output shape calculations, as well as enhancing test coverage to verify
these safeguards.
Enhancements to overflow handling and parallel processing:
* Changed the lambda function in `GatherCopyData` to use `ptrdiff_t`
instead of `int64_t` for the index, and explicitly cast batch and i
variables, ensuring safer arithmetic for large tensor sizes.
* Updated the parallel loop in `GatherCopyData` to iterate using
`ptrdiff_t` indices, preventing potential overflow when processing large
tensors.
Testing improvements:
* Added a new unit test `Gather_overflow_check` in `gather_op_test.cc`
to verify that the Gather operator correctly handles very large output
shapes without overflowing, specifically testing dimensions that exceed
the 32-bit integer limit.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->