onnxruntime
8c59cd4f
- [js/webgpu] Support GroupQueryAttention (#20237)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
[js/webgpu] Support GroupQueryAttention (#20237) TODOs: 1. Handle H * params.kvNumHeads greater than work group size limit. 2. Support BNSH kv cache.
References
#20237 - [js/webgpu] Support GroupQueryAttention
Author
axinging
Parents
90d49ccb
Loading