[DML EP] Improve memory usage and fix memory leak in graph capture (#20879)
Phi-3 vision loads 3 models in memory, which means that we have 3
different sessions, 3 different execution providers and 3 different
allocators all loaded at the same time. Since the DML EP uses a
bucketized allocator, this results in a lot of memory fragmentation
across all 3 models that can only be used by the model itself.
To fix that, we can disable the memory arena (term for any kind of
allocator that reuses memory in ORT) as an opt-in option. In the case of
LLMs, we essentially never need to reallocate memory after the initial
graphs have been capture, which means that we gain nothing by using the
bucketized allocator, and it causes unnecessary fragmentation.
---------
Co-authored-by: Patrice Vignola <pavignol@microsoft.com>