[MPS] Fix copy_cast_mps() on tensors with storage offset (#95093)
- The copy_cast path requires storage_offset to be applied before casting
- This should fix some correctness issues in transformer models
Fixes #94980
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95093
Approved by: https://github.com/kulinseth