pytorch
ce4900f3 - [cuDNN][cuDNN V8 API] Fix `benchmark_limit` ignoring failed kernels in FIND (#91032)

Commit
2 years ago
[cuDNN][cuDNN V8 API] Fix `benchmark_limit` ignoring failed kernels in FIND (#91032) Currently the `torch.backends.cudnn.benchmark_limit` setting ignores the validity/status of proposed cuDNN frontend execution plans because we do not know if they will complete successfully until execution is attempted. However, there are rare cases where the majority of execution plans fail and a fallback plan is needed (e.g., in the case of extremely small pointer alignment on the input tensors). If the limit is too small to include a working fallback plan, we currently bail out prematurely without checking the plans exhaustively. The fix is to defer applying the `benchmark_limit` setting until we are sure that plans will execute successfully, but this requires changes to the cuDNN frontend timing function. This PR adds a hacked version of the cuDNN frontend timing function for now, with the intent that we can switch to the upstream cuDNN frontend implementation once this functionality is added. CC @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/91032 Approved by: https://github.com/ngimel
Author
eqy eqy
Committer
Parents
Loading