[serve] Add speech to text (`/v1/audio/transcriptions`) (#39434)
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* use openai
* validate request, including detecting unused fields
* dict indexing
* dict var access
* tmp commit (tests failing)
* add slow
* use oai output type in completions
* (little rebase errors)
* working spec?
* guard type hint
* type hints. fix state (CB can now load different models)
* type hints; fn names; error type
* add docstrings
* responses + kv cache
* metadata support; fix kv cache; error event
* add output_index and content_index
* docstrings
* add test_build_response_event
* docs/comments
* gate test requirements; terminate cb manager on model switch
* nasty type hints
* more type hints
* disable validation by default; enable force models
* todo
* experiment: base model from typed dict
* audio working
* fix bad rebase
* load audio with librosa
* implement timed models
* almost working
* make fixup
* fix tests
* transcription request type
* tokenizer -> processor
* add example in docs
---------
Co-authored-by: Lysandre <hi@lysand.re>