flash_paged: s_aux may not exist (#40434)
Some implementations (i.e.,
https://huggingface.co/kernels-community/vllm-flash-attn3) support an
`s_aux` arg for attention sinks, but others
(https://huggingface.co/kernels-community/flash-attn) do not. If s_aux
is present in the kwargs, we forward it, otherwise we don't.
The user will still get an error if they use a model like gpt-oss-20b
with an implementation that does not support `s_aux`, but models that
don't use it won't error out. For example, [this is currently
failing](https://github.com/huggingface/transformers/blob/399cd5c04b11ba3f740b4f76e8067326786405cc/examples/pytorch/continuous_batching.py#L16)
because we are sending `s_aux: None` in the dict.