onnxruntime
25a6fdca - Specify attention-23 kernel and relax assertion in prepare qkv (#27217)

Commit
11 days ago
Specify attention-23 kernel and relax assertion in prepare qkv (#27217) This pull request updates the attention kernel selection logic and clarifies support for unidirectional (causal) attention in the CUDA attention implementation. The main changes focus on improving documentation, removing outdated comments, and explicitly setting the kernel type for better maintainability and clarity. Kernel selection and configuration improvements: * Explicitly set the `kernel_type` field to `AttentionKernel_Unfused` in the `AttentionData` structure to clarify which kernel is being used and improve future extensibility. Documentation and code clarity: * Added comments to clarify that unidirectional (causal) attention is supported by several attention kernel implementations, and that the TRT fused runner is only used for non-unidirectional cases, as enforced elsewhere. * Removed outdated TODO comments regarding parameter continuation and kernel selection, as these are now handled more explicitly in the code. [[1]](diffhunk://#diff-0701e4cc6d4951894ae1a60f35c1e6c0f69ba7595f896a23c8f5ed7265eab4ffL194) [[2]](diffhunk://#diff-0701e4cc6d4951894ae1a60f35c1e6c0f69ba7595f896a23c8f5ed7265eab4ffL223-R227)
Author
Parents
Loading