[CUDA] PagedAttention: use exact max_query_len on FA path #28409
elwhyjay
force pushed
from
b80299a1
to
11596b3d
2 days ago
elwhyjay
marked this pull request as draft 2 days ago
elwhyjay
force pushed
from
9e9ba43f
to
8278551d
2 days ago
elwhyjay
changed the title [CUDA] PagedAttention: use token_count for FA rotary grid [CUDA] PagedAttention: use exact max_query_len on FA path 2 days ago
elwhyjay
marked this pull request as ready for review 2 days ago
[CUDA] PagedAttention: use exact max_query_len on FA path
90c5702d
elwhyjay
force pushed
from
8278551d
to
90c5702d
2 days ago
Potential fix for pull request finding
37adb160
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub