Specify attention-23 kernel and relax assertion in prepare qkv (#27217)
This pull request updates the attention kernel selection logic and
clarifies support for unidirectional (causal) attention in the CUDA
attention implementation. The main changes focus on improving
documentation, removing outdated comments, and explicitly setting the
kernel type for better maintainability and clarity.
Kernel selection and configuration improvements:
* Explicitly set the `kernel_type` field to `AttentionKernel_Unfused` in
the `AttentionData` structure to clarify which kernel is being used and
improve future extensibility.
Documentation and code clarity:
* Added comments to clarify that unidirectional (causal) attention is
supported by several attention kernel implementations, and that the TRT
fused runner is only used for non-unidirectional cases, as enforced
elsewhere.
* Removed outdated TODO comments regarding parameter continuation and
kernel selection, as these are now handled more explicitly in the code.
[[1]](diffhunk://#diff-0701e4cc6d4951894ae1a60f35c1e6c0f69ba7595f896a23c8f5ed7265eab4ffL194)
[[2]](diffhunk://#diff-0701e4cc6d4951894ae1a60f35c1e6c0f69ba7595f896a23c8f5ed7265eab4ffL223-R227)