[Nested Tensor] do not use at::cuda::getDefaultCUDAStream() (#84134)
Use at::cuda::getCurrentCUDAStream(), not getDefaultCUDAStream().
Otherwise, add/remove padding kernels won't sync with current stream, resulting in flaky unit tests in test_nestedtensor.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84134
Approved by: https://github.com/drisspg