transformers
14b89fed - fix to accept cumulative_seqlens from TransformersKwargs in FA (#40194)

Commit
187 days ago
fix to accept cumulative_seqlens from TransformersKwargs in FA (#40194) * fix to the typings which are unmatched to FA function signature cumulative_seqlens_q/k -> cu_seq_lens_q/k: - in the FlashAttentionKwargs in modeling_flash_attention_utils - in the TransformersKwargs in generic - in the PagedAttentionArgs in continuous_batching It is **BC**, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor` * format changes by ruff * Update src/transformers/integrations/flash_paged.py unused function arg in `PagedAttentionCache.update` Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * revert continuous_batching signiture, which is more meaningful --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Author
Parents
Loading