fix memory issue when run with multi-stream (#12913)
* set correct stream when copy graph inputs
* fix training break
* a temporary hack
* when split chunk, set the correct stream and timestamp
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>