onnxruntime
f7113bdc - [CUDA EP Plugin] ResourceAcountant integration (#28028)

Commit
18 days ago
[CUDA EP Plugin] ResourceAcountant integration (#28028) This pull request introduces several enhancements and refactorings to the resource accounting and execution provider (EP) infrastructure, with a focus on better support for plugin-based CUDA execution providers. The most significant changes include the addition of type-erased arithmetic for resource accounting, improved handling of resource budgets for plugin EPs, and more robust device matching logic. These updates increase maintainability, enforce stricter type safety, and ensure correct resource tracking across both in-tree and plugin-based EPs. **Resource accounting improvements:** * Added type-erased arithmetic functions (`AddResourceCounts`, `ResourceCountExceeds`, `FormatResourceCount`) for `ResourceCount` to enforce exhaustive handling of variant types and improve type safety. [[1]](diffhunk://#diff-7b1c9ef14536f9a66ed370cb729b6609d12c5907b460d8f145a7ad5a401e0fb6R29-R40) [[2]](diffhunk://#diff-03c846683a6d76ded189d6ef24dc545da89ca418d0bce5cf1243d33cf1e2ac06R320-R351) * Refactored the `IResourceAccountant` interface: replaced `ResetPendingWeights` with `ResetForNewPass`, which resets both the stop flag and pending weights, and introduced a protected `ResetPendingWeightsImpl` for subclass-specific cleanup. [[1]](diffhunk://#diff-7b1c9ef14536f9a66ed370cb729b6609d12c5907b460d8f145a7ad5a401e0fb6L64-R83) [[2]](diffhunk://#diff-7b1c9ef14536f9a66ed370cb729b6609d12c5907b460d8f145a7ad5a401e0fb6R92-R96) [[3]](diffhunk://#diff-03c846683a6d76ded189d6ef24dc545da89ca418d0bce5cf1243d33cf1e2ac06L123-R123) [[4]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL280-R280) [[5]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL351-R351) **Plugin CUDA EP and resource budget enforcement:** * Added `kCudaPluginExecutionProvider` constant and updated logic to ensure plugin EPs correctly map to their in-tree accountant counterparts and are included in device matching and partitioning. [[1]](diffhunk://#diff-442c270eea3703252c48e97a7573960e14bf27a45a4443348840ed565330bf70R34) [[2]](diffhunk://#diff-b20f416b9fe3b85423eea6707c38753351a3f1b8ef7a319858b27794507e0686L102) [[3]](diffhunk://#diff-a8f614056d63b5b3325eea1d855afc96550c977c16d8fdba641012a79194b7b5L186-R187) [[4]](diffhunk://#diff-a8f614056d63b5b3325eea1d855afc96550c977c16d8fdba641012a79194b7b5L206-R207) [[5]](diffhunk://#diff-a8f614056d63b5b3325eea1d855afc96550c977c16d8fdba641012a79194b7b5L228-R229) [[6]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL1192-R1200) * Updated plugin EP infrastructure to pass and utilize resource accountant pointers, enabling host-side resource budget enforcement for plugin EPs and ensuring correct node assignment. [[1]](diffhunk://#diff-fb00c9a234d8cc889927a22de94acfcfd893b56505e8ed613961b1bf13c0e435R19) [[2]](diffhunk://#diff-fb00c9a234d8cc889927a22de94acfcfd893b56505e8ed613961b1bf13c0e435R54-R57) [[3]](diffhunk://#diff-6dac10650c4e1c5a55b95378173b33e95b300bf7c2350d8476088693b98652a5R16-R17) [[4]](diffhunk://#diff-6dac10650c4e1c5a55b95378173b33e95b300bf7c2350d8476088693b98652a5L239-R259) [[5]](diffhunk://#diff-6dac10650c4e1c5a55b95378173b33e95b300bf7c2350d8476088693b98652a5R273-R281) [[6]](diffhunk://#diff-0890d267a71ca02f4173c2ab226e6c5707fcbbf6bbb5f602fa5d92aa82f42a80R14-R22) [[7]](diffhunk://#diff-0890d267a71ca02f4173c2ab226e6c5707fcbbf6bbb5f602fa5d92aa82f42a80R233-R241) **Device matching and partitioning:** * Improved device matching heuristics to consider both in-tree and plugin CUDA EPs, and updated logic to prefer runtime device ordinals for more reliable device selection. Other minor changes include code style cleanups and additional includes for completeness.
Author
Parents
Loading