onnxruntime
a997bb46
- Refactor rocm attention (#14688)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
3 years ago
Refactor rocm attention (#14688) Extract QKV projection and attention computation into pipelines (composed from gemms and kernel launch). This will allow us to introduce ck flash attention in next PR
References
#14688 - Refactor rocm attention
Author
cloudhan
Parents
f3b66643
Loading