Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
huggingface/text-generation-inference
Pull Requests
Commits
Open
Closed
fix: don't use kernel layernorm on Blackwell architecture to avoid "no kernel image" error
#3343 opened 2025-12-11 09:32 by
AdamPalaxo
feat: prefer latest outlines core and compile grammar in router
#3340 opened 2025-11-14 03:38 by
drbh
fix: bump flake and update grammar logit processor
#3338 opened 2025-11-06 00:57 by
drbh
Remove `once_cell` dependency from multiple Cargo.toml files and update usage in `validation.rs` to use `std::sync::LazyLock` instead of `once_cell::sync::Lazy`.
#3334 opened 2025-09-28 13:37 by
htiennv
feat: expose GPU energy consumption (mJ) in responses
#3315 opened 2025-08-28 13:49 by
JulienDelavande
**Add dedicated CPU-only Dockerfile and update documentation for CPU/…
#3310 opened 2025-08-07 11:25 by
jakubgajski
support qwen3 on nvidia
#3302 opened 2025-07-23 08:04 by
icyxp
Retrieve the correct cached model batch size in Neuron config checker for Neuron Backend
#3300 opened 2025-07-19 02:52 by
jimburtoft
Attempt to fix CI errors
#3292 opened 2025-07-08 13:34 by
danieldk
fix: enable defs references in tool calls
#3291 opened 2025-07-07 14:37 by
drbh
Update quantization kernels
#3288 opened 2025-07-07 07:32 by
danieldk
feat: allow json_schema in response format and add test
#3276 opened 2025-06-25 19:50 by
drbh
Disable mamba in CPU platform
#3266 opened 2025-06-13 17:38 by
casassg
feat: improve llava next pooling for granite vision
#3255 opened 2025-06-04 13:54 by
drbh
Trtllm backend improvements
#3231 opened 2025-05-17 19:43 by
leejuyuu
Fix typos
#3210 opened 2025-05-06 08:42 by
omahs
feat: lock updated kernel versions
#3201 opened 2025-04-29 15:06 by
drbh
Set `uv` UV_PYTHON_INSTALL_DIR explicitly
#3197 opened 2025-04-27 17:15 by
sebastianliebscher
README: minimum Python version is 3.10
#3194 opened 2025-04-25 14:21 by
Frenzie
feat: support logit bias in chat request
#3186 opened 2025-04-22 16:20 by
drbh
Fix flashinfer plan call to use positional arguments for #3165
#3166 opened 2025-04-11 14:16 by
ruckc
Update to flashinfer 0.2.5
#3164 opened 2025-04-11 10:25 by
danieldk
Add chunked attn for L4
#3162 opened 2025-04-10 15:00 by
mht-sharma
Update links Inferentia refer docs
#3154 opened 2025-04-09 07:34 by
guspan-tanadi
feat: align function id with tool call response
#3111 opened 2025-03-13 19:31 by
drbh
wip: comment out prepend full_text
#3079 opened 2025-03-07 00:54 by
jrc2139
Expose the real-time internal state of the batcher through SSE
#3065 opened 2025-02-27 16:01 by
mfuntowicz
Added model name label to metrics and added an optional argument --served-model-name
wontfix
#3064 opened 2025-02-27 10:50 by
yashaswipiplani
display available cached versions in TGI server error message of Neuron backend
#3063 opened 2025-02-26 23:49 by
jimburtoft
Support xccl distributed backend
#3034 opened 2025-02-18 17:43 by
dvrogozh
Older