pytorch
dbedb1fa - [CUDA graphs] Sync after replay (#57556)

Commit View On GitHub

Commit

3 years ago

[CUDA graphs] Sync after replay (#57556) Summary: Right now** there's a bug in libcuda.so that triggers sometimes when graphs with certain topologies are replayed back to back without a sync in between. Replays that hit this bug turn into spaghetti: kernels reordered ignoring dependencies, kernels elided, corrupted results. Currently, the only workaround I know that fixes all our repros is a manual sync between replays. I'll remove the sync (or special case it based on cuda version) in a later PR, as soon as a fixed libcuda.so is available. The only substantive change is the cudaDeviceSynchronize, other lines changed are de-indenting an unneeded scope. ** The bug is in current and semi-recent public versions of libcuda.so. We discovered the bug recently and we're not sure yet which public release was first affected. The version that ships with 11.3 is definitely affected, versions that shipped with 11.1 and earlier are likely not affected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57556 Reviewed By: mruberry Differential Revision: D28343043 Pulled By: ngimel fbshipit-source-id: 3b907241aebdb8ad47ae96a6314a8b02de7bfa77

Author

mcarilli

Committer

facebook-github-bot

Parents

565550d8

pytorch dbedb1fa - [CUDA graphs] Sync after replay (#57556)

Commit

pytorch
dbedb1fa - [CUDA graphs] Sync after replay (#57556)