[webgpu] Simplify copyKVCache (#26371)
This pull request refactors the logic for handling past key/value (KV)
cache in Flash Attention implementation. The main focus is to simplify
and clarify the determination of when past KV cache is used, remove
redundant code paths in shader. The motivation is to remove the
dependency on parameters.total_sequence_length_ in cpu side to prepare
to register total_seqlen_tensor on gpu when graph capture enabled.