DML EP and MLAS buffer allocator - increase alignment to 64 bytes for AVX-512 processing (#15141)
Fixes #13119 top concerns by
* using `onnxruntime::AllocatorDefaultAlloc` instead of `malloc`
* set `MLAS_DEFAULT_PREFERRED_BUFFER_ALIGNMENT=64` which cascades that
value
to several members and functions not directly related to MLAS.
### Motivation and Context
* Fixes #13119 top concerns. Otherwise, alignment is to 16 bytes circa
1990s 👴
* Does not yet enable flexible alignment. Instead fixed at 64 (64 x 8
bits=512 bits) for modern NN hardware like AVX-512