onnxruntime
[CUDA] PagedAttention: use exact max_query_len on FA path
#28409
Open

[CUDA] PagedAttention: use exact max_query_len on FA path #28409

elwhyjay
elwhyjay elwhyjay force pushed from b80299a1 to 11596b3d 2 days ago
elwhyjay elwhyjay marked this pull request as draft 2 days ago
elwhyjay elwhyjay force pushed from 9e9ba43f to 8278551d 2 days ago
elwhyjay elwhyjay changed the title [CUDA] PagedAttention: use token_count for FA rotary grid [CUDA] PagedAttention: use exact max_query_len on FA path 2 days ago
elwhyjay elwhyjay closed this 2 days ago
elwhyjay elwhyjay reopened this 2 days ago
elwhyjay elwhyjay marked this pull request as ready for review 2 days ago
elwhyjay [CUDA] PagedAttention: use exact max_query_len on FA path
90c5702d
elwhyjay elwhyjay force pushed from 8278551d to 90c5702d 2 days ago
tianleiwu tianleiwu requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 2 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-05-08
elwhyjay Potential fix for pull request finding
37adb160

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone