Fix test TestCuda.test_streams_multi_gpu_query (#23912)
Summary:
This is a similar issue as TestCuda.test_events_wait.
PyTorch test sets a policy() method to assertLeaksNoCudaTensors.
Whenever a test is run, assertLeaksNoCudaTensors is called,
which in turn calls CudaMemoryLeakCheck, which in turn calls
initialize_cuda_context_rng, where it executes torch.randn
on each device, where a kernel is launched on each device.
Since the kernel may not finish on device 0, the first assertion
self.assertTrue(s0.query()) fails.
The fix is to insert
torch.cuda.synchronize(d0)
torch.cuda.synchronize(d1)
at the beginning of the test so that previously launched kernels finish before the real
test begins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23912
Differential Revision: D16688599
Pulled By: ezyang
fbshipit-source-id: 3de2b555e99f5bbd05727835b9d7c93a026a0519