Address performance regression with duplicate initializers across DML partitions (#16087)
This addresses a DML performance regression introduced by the constant
sharing pass.
The constant sharing pass identifies small initializer tensors which
contain identical values and merges them. This could have the effect of
causing DML to treat those tensors as non-constant and skip certain
optimization.
To prevent this, there is now an element count threshold below which the
DML EP will enable this optimization, even though it results in
duplicate work uploading and pre-processing the common tensor at
multiple operators.