[Spec Decode] (1/2) Remove batch expansion #8839
draft
c4c5dabf
w/o cuda graph support
cb08091e
args and tests
8c10b115
disable mqa for ngram and format
44930fba
clean up and tests
e64c61b5
revert example
541b7674
minor
b6c1de3c
minor
5824b78f
LiuXiaoxuanPKU
changed the title [Spec Decode] (1/2) Remove batch expansion w/o cuda graph [Spec Decode] (1/2) Remove batch expansion 1 year ago
fix tests -- chunked prefill and hiddens states in spec dec
07aebc07
fix
d6cb1cc6
minor
bcc1fe95
fix
b036d062
Merge branch 'main' into remove_batch_expansion
b93694d9
fix sampler for beam search
35750a60
revert num compute tokens
741068ae
disbale mqa scorer when draft model and target model have different m…
71be3402
diable mqa for cuda graph
cff6b0fd
fix partial comments
f4fb00b9
fix comments
b3e86910
fix sampler and spec dec tests
238e5a06
remove backend
5063c95d
more test fix
70662b04
Merge branch 'main' into remove_batch_expansion
878d2da4
fix num_compute_token
0e32744b
clean up
7ee29986
comaniac
approved these changes
on 2024-10-01
more fix for num_compute_token
d39c8a93
change log condition
79ac29ce
add comments
6f3388b5
query len for multi-step, specify ci backend
14253322
fix ci
e5702a90
fix
8e276648
format
3f3c2228
context_len for multi-step and encoder decoder, fix decode_len
27074227
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub