pytorch
943085fe - [RELAND] [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size (#73211)

Commit

3 years ago

[RELAND] [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size (#73211) Summary: https://github.com/pytorch/pytorch/pull/62143 was reverted (https://github.com/pytorch/pytorch/pull/72089) because, when running native tests internally with cudnn and GPUs such that `CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H` was used, we hit some `CUDNN_STATUS_NOT_SUPPORTED` errors. Based on https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#features-of-rnn-functions and experiments, I strongly suspect the errors were because `CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H` doesn't support variable sequence lengths in the batch. This PR restores https://github.com/pytorch/pytorch/pull/62143 and adds a bailout condition if the input is a packed batch that might have different sequence lengths per element. Question for review: Do we also need to add a bailout condition if the input is double precision? Pull Request resolved: https://github.com/pytorch/pytorch/pull/73211 Reviewed By: ejguan Differential Revision: D34688016 Pulled By: ngimel fbshipit-source-id: e7335c4701dabc7d0b36ebdb6414c4353a71ee91 (cherry picked from commit b9023bfd1c31eb9a38bf0552a20412e9a4e60b91)

References

#74332 - Merge master into lazy_tensor_staging

Author

mcarilli

Committer

pytorchmergebot

Parents

086645ad

Files1

aten/src/ATen/native/cudnn
- RNN.cpp

pytorch 943085fe - [RELAND] [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size (#73211)

pytorch
943085fe - [RELAND] [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size (#73211)