onnxruntime
a997bb46 - Refactor rocm attention (#14688)

Commit
3 years ago
Refactor rocm attention (#14688) Extract QKV projection and attention computation into pipelines (composed from gemms and kernel launch). This will allow us to introduce ck flash attention in next PR
Author
Parents
Loading