[ROCm] HIP Lazy Streams (#119996)
For ROCm/HIP, each stream is lazily initialized rather than creating all streams when the first stream is requested. HIP streams are not as lightweight as CUDA streams; the pooling strategy can affect performance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119996
Approved by: https://github.com/ezyang