onnxruntime
96926a07 - [webgpu] Fused CopyKVCache and SplitPackedQKVWithRotaryEmbedding as SplitPackedQKVWithRotaryEmbeddingAndCopyKV (#26563)

Commit
73 days ago
[webgpu] Fused CopyKVCache and SplitPackedQKVWithRotaryEmbedding as SplitPackedQKVWithRotaryEmbeddingAndCopyKV (#26563) ### Description <!-- Describe your changes. --> Create a ultimated fused path called SplitPackedQKVWithRotaryEmbeddingAndCopyKV which fused SplitPackedQKVWithRotaryEmbedding and CopyKVCache. When use flash attention and static kv cache is enabled, run it. We did the following things: - Support components for existed SplitPackedQKVWithRotaryEmbedding - Fused it and copykvcache as new SplitPackedQKVWithRotaryEmbeddingAndCopyKV ### Motivation and Context On NV5080, the token generation speed improve ~4%. | generation tps | Before | After | |--------|--------|-------| | NV5080 | 135 | **141** | | Intel | 15.3 | 15.4 | | Mac | 71.2 | 71.8 |
Author
Parents
Loading