Fix overflow in DmlGraphFusionHelper::ProcessInputData (#27815)
### Description
<!-- Describe your changes. -->
This change tries to address a problem in the DML EP where AlignToPow2
rounded up tensorByteSize to a 4-byte boundary before the data was read
from the source buffer. This caused CreateCpuResource, CreateResource,
WriteToFile, and the inputRawData vector construction to read 1–3 bytes
past the end of the original tensor data.
CreateResource and CreateCpuResource already independently align the
D3D12 resource descriptor size, so they work correctly with the original
(unaligned) byte count. The fix is to move the alignment to the location
where it's needed.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required because it addresses a crash / incorrect behavior in
the DML EP.