onnxruntime
14f540b7 - [webgpu] Simplify copyKVCache (#26371)

Commit
106 days ago
[webgpu] Simplify copyKVCache (#26371) This pull request refactors the logic for handling past key/value (KV) cache in Flash Attention implementation. The main focus is to simplify and clarify the determination of when past KV cache is used, remove redundant code paths in shader. The motivation is to remove the dependency on parameters.total_sequence_length_ in cpu side to prepare to register total_seqlen_tensor on gpu when graph capture enabled.
Author
Parents
Loading