onnxruntime
50ee1b05 - [DML EP] Improve memory usage and fix memory leak in graph capture (#20879)

Commit
1 year ago
[DML EP] Improve memory usage and fix memory leak in graph capture (#20879) Phi-3 vision loads 3 models in memory, which means that we have 3 different sessions, 3 different execution providers and 3 different allocators all loaded at the same time. Since the DML EP uses a bucketized allocator, this results in a lot of memory fragmentation across all 3 models that can only be used by the model itself. To fix that, we can disable the memory arena (term for any kind of allocator that reuses memory in ORT) as an opt-in option. In the case of LLMs, we essentially never need to reallocate memory after the initial graphs have been capture, which means that we gain nothing by using the bucketized allocator, and it causes unnecessary fragmentation. --------- Co-authored-by: Patrice Vignola <pavignol@microsoft.com>
Parents
Loading