onnxruntime
56c984ff - [webgpu] Simplify the signature of CanApplyFlashAttention (#26926)

Commit

134 days ago

[webgpu] Simplify the signature of CanApplyFlashAttention (#26926) This pull request simplifies the logic for handling present key/value tensors in the WebGPU Flash Attention implementation. The main change is that the responsibility for creating internal present key/value tensors is moved from the caller to the `ApplyFlashAttention` function itself. This reduces code duplication and makes the API easier to use. Additionally, the `CanApplyFlashAttention` function is simplified to remove unnecessary checks for present key/value tensors.

References

#26926 - [webgpu] Simplify the signature of CanApplyFlashAttention

Author

qjia7

Parents

4a858a82

onnxruntime 56c984ff - [webgpu] Simplify the signature of CanApplyFlashAttention (#26926)

onnxruntime
56c984ff - [webgpu] Simplify the signature of CanApplyFlashAttention (#26926)