Cleanup: Consolidate `OpKernel::UseSharePrePackedBuffers_V2` and `OpKernel::UseSharePrePackedBuffers` (#27924)
### Description
Consolidate `OpKernel::UseSharedPrePackedBuffers` and
`OpKernel::UseSharedPrePackedBuffers_V2` into a single virtual method,
resolving the TODO in `op_kernel.h`.
#### Background
The `OpKernel` class previously had two virtual methods for consuming
shared pre-packed weight buffers:
- **`UseSharedPrePackedBuffers`** (V1) — 3 params: `prepacked_buffers`,
`input_idx`, `used_shared_buffers`
- **`UseSharedPrePackedBuffers_V2`** — 4 params: added
`prepacked_buffer_sizes` (a `gsl::span<const size_t>`)
V2 was introduced to pass buffer sizes alongside the buffers. Its
default implementation forwarded to V1 for backward compatibility. The
framework (`session_state.cc`) only ever called V2.
#### Changes
Merged both methods into a single `UseSharedPrePackedBuffers` using the
V2 signature:
```cpp
virtual Status UseSharedPrePackedBuffers(std::vector<BufferUniquePtr>& prepacked_buffers,
gsl::span<const size_t> prepacked_buffer_sizes,
int input_idx,
/*out*/ bool& used_shared_buffers);
```
Updated **27 files** across the codebase:
| Category | Files | Change |
|----------|-------|--------|
| Base class | `op_kernel.h` | Removed V1 + V2; single 4-param method |
| Framework | `session_state.cc` | Renamed `_V2` call |
| Plugin EP bridge | `ep_kernel_registration.cc` | Renamed override |
| QMoECPU | `moe_quantization_cpu.h/.cc` | Renamed V2 override +
template instantiations |
| CPU provider (8 kernels) | `gemm`, `matmul`, `conv_transpose`,
`fp16_conv`, `qlinearconv`, `matmul_integer_base`, `deep_cpu_lstm`,
`deep_cpu_gru` | Added `prepacked_buffer_sizes` param |
| ACL provider (2 kernels) | `acl/conv`, `acl/matmul` | Added param |
| Contrib ops (4 kernels) | `matmul_nbits`, `dynamic_quantize_lstm`,
`attention_quant`, `bert/attention` | Added param |
| Tests | `session_state_test.cc` | Updated test kernel override |
#### Notes
- Existing V1 overrides add the new `prepacked_buffer_sizes` parameter
as **unnamed/unused** (`/*prepacked_buffer_sizes*/`) — no logic changes
in those kernels.
- The C API (`SetSharedPrePackedWeight` in `onnxruntime_ep_c_api.h`)
already passes buffer sizes, so **no C API changes** were needed.
- Private helper functions (e.g., `UseSharedPrePackedBuffersImpl` in
LSTM/GRU) are not virtual overrides and were **not modified**.
### Motivation and Context
Addresses the TODO at
`include/onnxruntime/core/framework/op_kernel.h:139`:
> TODO: Consolidate UseSharedPrePackedBuffers and
UseSharedPrePackedBuffers_V2 into a single function, which will require
updating kernel-based provider-bridge EPs (cpu, cuda, webgpu).