[CUDA graphs] Don't sync between replays for cuda driver version 11.4+ (#61063)
Summary:
The bug in libcuda.so that required https://github.com/pytorch/pytorch/pull/57556 is fixed for libcuda.so versions >= 11.4.
This PR changes replay() to sync after each launch only if the process's in-use libcuda.so is < 11.4.
With all the "enhanced" and "forward" compatibility promises flying around, and the fact that "driver" sometimes means kernel-mode driver and sometimes means user-mode driver (libcuda.so), I wasn't sure if this PR's check suffices to trigger the sync iff the in-use libcuda.so is < 11.4, but Cuda people say what I wrote is reasonable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61063
Reviewed By: mruberry
Differential Revision: D29600907
Pulled By: ngimel
fbshipit-source-id: 71bf0bcbde43091e29f3812440abeb7a95d161e2