onnxruntime
414b012f - Add memory efficient attention from CUTLASS (#14343)

Commit
3 years ago
Add memory efficient attention from CUTLASS (#14343) ### Description Add memory efficient attention from CUTLASS. TODO (in next pull request): (1) Need performance tests on different GPUs, then add a sequence length threshold (only activate it for long sequence length). (2) Merge changes from https://github.com/NVIDIA/cutlass/pull/773 when it is in cutlass master.
Author
Parents
Loading