fix(ngram): match async ngram_gpu acceptance rate to CPU #44056
shiyangyang2001-lgtm
changed the title [ngram][async] Fix ngram_gpu acceptance rate to match CPU ngram fix(ngram): match async ngram_gpu acceptance rate to CPU 3 days ago
[spec decode] Pass ngram-trimmed invalid tokens to scheduler stats
9d284c3a
Assignees
No one assigned
Labels
speculative-decoding
v1
Login to write a write a comment.
Login via GitHub