[refactor] Serving into proper modules (#44796)
* new serve file
* app
* model_manager done
* update serve
* style
* poc done
* renaming
* fix
* new tests
* update metrics and processor
* hardcode n_batch for now
* add response api + compile
* more tests
* add it for now but we will move it
* remove cache impl
* add back load_model
* fix naming
* add transcription
* tool calls better !
* vlm support for both response and chat endpoints
* update bench
* fix vl test
* first iteration of cb
* cb tests
* typing + review
* update test
* better benchmark
* better stream
* update bench
* fix
* serve refactored
* merge
* update
* fix
* style
* simpler
* style
* update warmup
* remove llamacpp integration for now
* styke
* styke
* style again
* remove annoattion
* review !
* style
* much cleaner
* renamed
* remove bench for now
* batch output
* style
* type
* better tests
* update test
* queue draining
* some logs
* readd nathan feature + some minor fixes
* fix
* guard transcription
* better now
* fix
* adding lock to see if this helps
* remove locks
* lock again
* update bench and remove lock for now