pytorch
965b9f48 - [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size (#62143)

Commit

2 years ago

[cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size (#62143) Summary: This PR enables a new cuDNN RNN/LSTM algorithm `CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H` when the hidden_size is small. Operator benchmark observes 10x performance improvement in some shapes. - [X] forward https://github.com/xwang233/code-snippet/tree/master/cudnn-rnn-bench-62143/forward - [X] backward https://github.com/xwang233/code-snippet/tree/master/cudnn-rnn-bench-62143/backward - [X] end-to-end model: benchmark looks good Pull Request resolved: https://github.com/pytorch/pytorch/pull/62143 Reviewed By: anjali411 Differential Revision: D33771442 Pulled By: ngimel fbshipit-source-id: 0640abc6b90ebd2428c3182ce03bf0b9c30a2ec9 (cherry picked from commit 73b153a528fb9b64b994c1174882bc2f64b1ed47)

References

#72894 - Merge pytorch master into lazy_tensor_staging

Author

xwang233

Committer

pytorchmergebot

Parents

358b5078

pytorch 965b9f48 - [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size (#62143)

pytorch
965b9f48 - [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size (#62143)