Autoscaling inference endpoints #412
first draft of autoscale
19ab5159
adding better management for restarts and resizes
b4c98948
Merge branch 'main' into clem_inference_endpoint_autoscale
56111ca9
upgraded autoscale
e69c321b
should be working now!
18a88418
added pause option
b59f8973
clefourrier
changed the title First draft of autoscale Autoscaling inference endpoints 1 year ago
restore endpoint name vs model name diff
cb6ea93e
debug
3ea93e9d
Merge branch 'main' into clem_inference_endpoint_autoscale
72a978f1
added example
b39131f5
Merge branch 'main' into clem_inference_endpoint_autoscale
aa0e9e5e
fix to parallelism manager - no need for endpoint
99607333
fix default batch size override
8b061043
Merge branch 'main' into clem_inference_endpoint_autoscale
1193875a
NathanHB
approved these changes
on 2024-12-04
Update examples/model_configs/endpoint_model_lite.yaml
2f61f1b1
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub