do not insert to_dtype for memory copy only buffers (#97147)
Remove redundant to_dtype like
`load_bf16 + to_fp32 + to_bf16 + store_bf16` => `load_bf16 + store_bf16`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97147
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel