[MIGraphX EP] Fix CopyTensorAsync and add guards for stream sync CopyTensors (#16787)
Add compile guards to gate functionality based on MIGRAPHX_STREAM_SYNC
for adding the following
- remove excess hipStreamSyncronize to nullstream on CopyTensor calls
- Add proper call for stream synchronized CopyTensorAsync for
DeviceToHost case
Without this change subsequent CopyTensorAsync() calls will fail for
cards that don't use pinned memory thus causing hipMemcpy() calls to
occur before certain kernel operations occur.

becomes

---------
Co-authored-by: Ted Themistokleous <tthemist@amd.com>