onnxruntime
37750574 - use correct total length to fix static kv_cache performance (#23615)

Commit
333 days ago
use correct total length to fix static kv_cache performance (#23615) when using static kv_cache, past_sequence_length is the max sequence length of kv_cache. issue1: total_sequence_length will be larger than the cache entry issue2: we do way more calculations that needed so things are noticeable slower
Author
Parents
Loading