onnxruntime
414b012f - Add memory efficient attention from CUTLASS (#14343)

Commit

3 years ago

Add memory efficient attention from CUTLASS (#14343) ### Description Add memory efficient attention from CUTLASS. TODO (in next pull request): (1) Need performance tests on different GPUs, then add a sequence length threshold (only activate it for long sequence length). (2) Merge changes from https://github.com/NVIDIA/cutlass/pull/773 when it is in cutlass master.

References

#14343 - Add memory efficient attention from CUTLASS

Author

tianleiwu

Parents

e64f357a

onnxruntime 414b012f - Add memory efficient attention from CUTLASS (#14343)

onnxruntime
414b012f - Add memory efficient attention from CUTLASS (#14343)