transformers
38593c2e - [refactor] Serving into proper modules (#44796)

Commit
2 days ago
[refactor] Serving into proper modules (#44796) * new serve file * app * model_manager done * update serve * style * poc done * renaming * fix * new tests * update metrics and processor * hardcode n_batch for now * add response api + compile * more tests * add it for now but we will move it * remove cache impl * add back load_model * fix naming * add transcription * tool calls better ! * vlm support for both response and chat endpoints * update bench * fix vl test * first iteration of cb * cb tests * typing + review * update test * better benchmark * better stream * update bench * fix * serve refactored * merge * update * fix * style * simpler * style * update warmup * remove llamacpp integration for now * styke * styke * style again * remove annoattion * review ! * style * much cleaner * renamed * remove bench for now * batch output * style * type * better tests * update test * queue draining * some logs * readd nathan feature + some minor fixes * fix * guard transcription * better now * fix * adding lock to see if this helps * remove locks * lock again * update bench and remove lock for now
Author
Parents
Loading