vllm
[Perf] [Hybrid] Copy num_accepted_tokens in non-blocking way when not using prefix caching
#35442
Merged

[Perf] [Hybrid] Copy num_accepted_tokens in non-blocking way when not using prefix caching #35442

tdoublep
mergify mergify added v1
gemini-code-assist
gemini-code-assist commented on 2026-02-26
heheda12345
heheda12345 commented on 2026-02-27
peakcrosser7
tdoublep
peakcrosser7
tdoublep
peakcrosser7
heheda12345
peakcrosser7
tdoublep
heheda12345
heheda12345 approved these changes on 2026-03-03
heheda12345 heheda12345 enabled auto-merge (squash) 5 days ago
heheda12345
github-actions github-actions added ready
tdoublep Copy num_accepted_tokens in non-blocking way when not using prefix ca…
5c244276
tdoublep Use existing self.num_accepted_tokens buffer instead of temporary tensor
a5eae630
tdoublep tdoublep force pushed from fcb96b91 to a5eae630 5 days ago
tdoublep tdoublep requested a review from njhill njhill 5 days ago
vllm-bot vllm-bot merged ad9d09e2 into main 4 days ago
tdoublep tdoublep deleted the faster-mtp-no-pc branch 4 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone