onnxruntime
8c59cd4f - [js/webgpu] Support GroupQueryAttention (#20237)

Commit
1 year ago
[js/webgpu] Support GroupQueryAttention (#20237) TODOs: 1. Handle H * params.kvNumHeads greater than work group size limit. 2. Support BNSH kv cache.
Author
Parents
Loading