[CUDA graphs] Cuda RNG-safe graph capture and replay bindings (#48875)
Summary:
Part 2 of https://github.com/pytorch/pytorch/pull/46148 refactor. (part 1 was https://github.com/pytorch/pytorch/pull/48694.)
Contains
- a few more CUDAGeneratorImpl diffs to clean up graph capture interaction
- Capture and replay bindings that interact correctly with CUDAGeneratorImpl
- Tests.
Diffs compile and tests pass on my machine (ubuntu 20.04, cuda 11.0) but it needs finetuning for many CI builds.
See [Note [CUDA Graph-safe RNG states]](https://github.com/pytorch/pytorch/blob/02d89f9f1d7f32ebf7ec509d5c14b2f39690997a/aten/src/ATen/CUDAGeneratorImpl.h#L13-L85) for the strategy, based on https://github.com/pytorch/pytorch/pull/46148#issuecomment-724414794.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48875
Reviewed By: zou3519
Differential Revision: D25482654
Pulled By: ngimel
fbshipit-source-id: 634dbc4c6c9d7d0d9a62dc81a52d430561f905fe