server : allow using LoRA adapters per-request #10994
slot.can_batch_with
2ba6efc5
lora per request
9d84127f
test: force disable cache prompt
9947b077
move can_batch_with check
b9b2b637
fix condition
076346db
Merge branch 'master' into xsn/lora_per_request
d67fefb9
add slow test with llama 8b
367f0ab1
update docs
bf7df957
move lora change task to queue
1dbd16ab
ngxson
marked this pull request as ready for review 1 year ago
ggerganov
approved these changes
on 2025-01-02
Apply suggestions from code review
a90e0642
lora_base
9274a6bc
remove redundant check
74e460d5
ngxson
merged
0da5d860
into master 1 year ago
Assignees
No one assigned
Labels
examples
python
server
Login to write a write a comment.
Login via GitHub