[refactor] Serving into proper modules #44796
new serve file
d8e7c457
app
f2388673
model_manager done
be0291d7
update serve
e84d82ef
style
fb773052
poc done
d869d62a
Merge remote-tracking branch 'origin/main' into refactor-serving
5aadd1a2
renaming
bd734e88
fix
69d32644
new tests
f5afd6c4
update metrics and processor
fedad8e5
hardcode n_batch for now
9b904b1c
add response api + compile
0084b910
more tests
1d5d1cb2
add it for now but we will move it
3d64a8cd
Merge remote-tracking branch 'origin/main' into refactor-serving
74b35931
remove cache impl
552603c1
add back load_model
3643ece6
fix naming
12c0f558
add transcription
d4ffdf41
tool calls better !
68cd5bc5
SunMarc
marked this pull request as ready for review 54 days ago
vlm support for both response and chat endpoints
6da3f3c1
update bench
a92ebe29
fix vl test
76a5c836
first iteration of cb
31e59c35
cb tests
962d0391
typing + review
13945c1b
update test
4abb194f
better benchmark
16589814
better stream
720ecdbd
update bench
44246357
fix
7d0cd776
serve refactored
533233cc
merge
880e6e07
update
4aa7fec0
fix
3ab4e092
style
06bacbbf
simpler
ef106187
style
09d5fe17
update warmup
96b6b8bd
remove llamacpp integration for now
07ecd2a6
styke
fad7c25c
styke
feed4cbf
style again
abd40872
Merge branch 'main' into output-callback-cb
120e37b6
remove annoattion
d550b9b9
Merge branch 'main' into refactor-serving
ca06e2bf
review !
ac0d6a1c
Merge remote-tracking branch 'origin/main' into output-callback-cb
66314b54
style
9d52002b
much cleaner
c48aec3a
renamed
b13dacc1
remove bench for now
7855606b
batch output
ef1c7101
style
caaab6e1
type
4c1cd01d
better tests
702ff749
update test
80b5c780
queue draining
a8461fc8
Merge branch 'main' into output-callback-cb
480828d2
Merge remote-tracking branch 'origin/main' into refactor-serving
cb83702a
add seed
9db52a0f
Merge branch 'main' into refactor-serving
9485f680
Merge remote-tracking branch 'origin/main' into refactor-serving
160b2f6c
some logs
40417ee3
Merge remote-tracking branch 'origin/main' into refactor-serving
3bd6a095
readd nathan feature + some minor fixes
ced96c2b
fix
ff02cd79
guard transcription
307498ee
better now
ffe4c64e
fix
06a78815
adding lock to see if this helps
052cbc78
SunMarc
force pushed
from
24fb171a
to
052cbc78
46 days ago
remove locks
67997271
lock again
3a07c867
update bench and remove lock for now
7a7abf20
Merge branch 'main' into refactor-serving
bbd5cb02
SunMarc
enabled auto-merge 46 days ago
SunMarc
merged
38593c2e
into main 45 days ago
SunMarc
deleted the refactor-serving branch 45 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub