pytorch
90ffab6e - enable double backward for non-cudnn LSTM and GRU (#26660)

Commit View On GitHub

Commit

4 years ago

enable double backward for non-cudnn LSTM and GRU (#26660) Summary: An attempt to enable double backward for non-cudnn LSTM and GRU (see https://github.com/pytorch/pytorch/issues/25315, https://github.com/pytorch/pytorch/issues/20449). RNN works already because it does not rely on fused kernels. This does not implement double backward function itself, because that is pretty hard to spell out. Instead, it implements backward using differentiable operations, so that double backward can be done automatically. The good: seems to work, no effect on performance on the usual case without double backward. because fused lstm backward is used. The bad: Performance of backward and, especially, double backward, is pretty bad. Scripting would still be a preferred way if we want a performant solution. Performance and/or memory use can be slightly improved if in-place variants can be used for sigmoid_backward and tanh_backward to avoid cat in the end, but I'm not yet sure it's possible, and in any case it is only slight improvement. The ugly: I could not figure out a way to reuse workspace that contains the sum of the gates with the applied sigmoid and tanh operations, so that's probably another perf and memory hit. cc soumith, albanD. If you think this approach is viable, I can extend to GRU and RNN. Thanks to mcarilli whose approach to double backward in weight norm I copied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26660 Test Plan: added tests to check gradgrad for GRU and LSTM with cudnn disabled. Differential Revision: D17581489 Pulled By: ngimel fbshipit-source-id: efd204289e9a0e94d94896a0b3bff5cf6246cafa

Author

ngimel

Committer

facebook-github-bot

Parents

91549ef6

pytorch 90ffab6e - enable double backward for non-cudnn LSTM and GRU (#26660)

Commit

pytorch
90ffab6e - enable double backward for non-cudnn LSTM and GRU (#26660)