onnxruntime
15f6bde9 - [webgpu] Make the GQA's intermediate buffer static (#25091)

Commit
312 days ago
[webgpu] Make the GQA's intermediate buffer static (#25091) ### Description <!-- Describe your changes. --> This PR makes the intermediate generated buffers static in GQA for the static kv cache so that it's possible to use the graph capture capability on llm. The changes may improve the buffer cache hit rate but also slightly increase the average gpu memory usage.
Author
Parents
Loading