Fix DML regression from allocator refactor and enable unrounded weight allocation in ORT API (#17030)
This addresses a DML performance regression from the following PR
resulting in allocations not being rounded and pooled in the DML
execution provider.
https://github.com/microsoft/onnxruntime/pull/15833
This also fixes a pre-existing limitation that allocations during
session initialization (primarily large weights and persistent
resources) only bypassed rounding and pooling while using the Winml API.
The allocator now also respects a caller's rounding mode parameter when
provided.