[Nested Tensor] do not use at::cuda::getDefaultCUDAStream(), again (#91180)
Otherwise, Nested Tensor kernels won't sync with current stream, resulting in flaky unit tests in test_nestedtensor.py.
This is the second time the wrong streams have been used in NestedTensor code. See #84134 for another example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91180
Approved by: https://github.com/mikaylagawarecki