flax
1b2754a9 - Split the attention softmax so that the expensive elementwise division happens on an array that's O(N) rather than O(N^2)

Commit
5 years ago
Split the attention softmax so that the expensive elementwise division happens on an array that's O(N) rather than O(N^2) PiperOrigin-RevId: 317793206
References
Author
Committer
Parents
Loading