flash_paged: s_aux may not exist (#40434)

Commit

315 days ago

flash_paged: s_aux may not exist (#40434) Some implementations (i.e., https://huggingface.co/kernels-community/vllm-flash-attn3) support an `s_aux` arg for attention sinks, but others (https://huggingface.co/kernels-community/flash-attn) do not. If s_aux is present in the kwargs, we forward it, otherwise we don't. The user will still get an error if they use a model like gpt-oss-20b with an implementation that does not support `s_aux`, but models that don't use it won't error out. For example, [this is currently failing](https://github.com/huggingface/transformers/blob/399cd5c04b11ba3f740b4f76e8067326786405cc/examples/pytorch/continuous_batching.py#L16) because we are sending `s_aux: None` in the dict.

References

#40434 - flash_paged: s_aux may not exist

Author

pcuenca

Parents

34108a22

transformers 58cebc84 - flash_paged: s_aux may not exist (#40434)

transformers
58cebc84 - flash_paged: s_aux may not exist (#40434)