[webgpu] Simplify copyKVCache (#26371)

Commit

194 days ago

[webgpu] Simplify copyKVCache (#26371) This pull request refactors the logic for handling past key/value (KV) cache in Flash Attention implementation. The main focus is to simplify and clarify the determination of when past KV cache is used, remove redundant code paths in shader. The motivation is to remove the dependency on parameters.total_sequence_length_ in cpu side to prepare to register total_seqlen_tensor on gpu when graph capture enabled.

References

#26371 - [webgpu] Simplify copyKVCache

Author

qjia7

Parents

5ef8ce21

onnxruntime 14f540b7 - [webgpu] Simplify copyKVCache (#26371)

onnxruntime
14f540b7 - [webgpu] Simplify copyKVCache (#26371)