onnxruntime
9b2b2ee8 - [webgpu] Use components for VxAttentionScore (#23726)

Commit
338 days ago
[webgpu] Use components for VxAttentionScore (#23726) For phi3.5-gqa-static sum_long(>1000 tokens) on meteor lake. Before: 300 tokens in 27.0sec, e2e:11.1 tps, prompt: 212.4 tps, gen: 14.2 tps, ttft: 5.85 sec After: 300 tokens in 23.0sec, e2e:13.0 tps, prompt: 248.9 tps, gen: 16.6 tps, ttft: 4.99 sec
Author
Parents
Loading