Pull Requests huggingface/text-generation-inference

Expose the real-time internal state of the batcher through SSE

#3065 opened 2025-02-27 16:01 by mfuntowicz

Added model name label to metrics and added an optional argument --served-model-name wontfix

#3064 opened 2025-02-27 10:50 by yashaswipiplani

display available cached versions in TGI server error message of Neuron backend

#3063 opened 2025-02-26 23:49 by jimburtoft

Support xccl distributed backend

#3034 opened 2025-02-18 17:43 by dvrogozh

Fix CPU and memory affinity under external resource management

#3012 opened 2025-02-11 10:34 by askervin

Kvrouter that will increase the kv-cache hits in case of multiple routing strategy

#2965 opened 2025-01-29 11:43 by Narsil

Update Dockerfile to use devel image for compatibility

#2848 opened 2024-12-16 13:00 by YaserJaradeh

Enable qwen2vl video

#2756 opened 2024-11-18 17:59 by drbh

[WIP] Add gfx1100 support to AMD pytorch build

#2642 opened 2024-10-13 06:11 by cazlo

Add model_load_time metric

#2311 opened 2024-07-26 00:48 by Edwinhr716