Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
huggingface/text-generation-inference
Pull Requests
Commits
remove_post_load_weights
20250708-ci-fixes
add_L4
add_api_key
add_batch_dimension
add_chunked_atn
add_chunked_attn
add_deepseekv3
add_gptq_docs
add_integration_test
add_readme_dashboard
add_tunable_prefill
add_vlm_chunking
add-chat-response-format
add-google-cloud-provider
add-quickstart-script
add-rotary-embed-tests
add-small-ttft-script
add-test-for-warmup-and-kvcache
adding_docs
adjust-mllama-test-output
adjust-where-request-max-tokens-is-defaulted
aiter_kernels
amd-ci-fx
auto_length
automodel-supports-flash-paged-attention
avoid-cuda-graph-during-warmup-if-oom
avoid-zero-seed
backends/trtllm
backends/trtllm-executor
baichuan2-13b
bnb4
bugfix/add_tools_prompt
bugfix/moe-kernels-imports
bugfix/phi-exl2
bump-client-0.6.2
bump-kernel-versions
bump-poetry-and-requirements
chunked_attn_l4
ci_amd
ci_amd2
ci_amd3
ci_amd4
ci2
ci-amihalik-update-chat-completion-messages
ci-new-cluster
ci-patch
ci-run-openai-function-calling-compatible-support
ci-update_xpu_image
ci-xpu
ci-xpu2
close_dl_thread
compat_logger
compile-grammar-in-router
cuda_ipc_allreduce
debug/gemma2
debug-gpt2
debug-request-id
debug-torch-23
debugging-timeouts
deploy/aml
dev
development-guide
dummy
enable_non_divisible_embeddings
enable-non-grammar-constrained-tools
enable-qwen2vl-video
enable-transformers-vlm
exl2
experiment/moe
explore-static-triton-kernels
explore-t4-gemma-issues
feat/add-load-test
feat/attention_sinks
feat/backend_abstraction
feat/backend_feature
feat/better_tokens
feat/cuda_12
feat/flash_decoding
feat/improve_max_tokens
feat/max_queue_size
feat/page_re_alloc
feat/parse_logs
feat/support_deepspeed
feat-backend-llamacpp
feature/machete
feature/moe-kernels
feature/no_repeat_ngram_size_ci
feature/no_repeat_ngram_size
feature/phi-3-small
feature/prefix
feature/radix-prefix-cache
feature/radix-prefix-cache-bench
feature/vlm-prefix-caching
fix/allow-top-p-0
fix/avoid_record_streams
fix_default_arg
fix_exl2
fix_fp8_llama3.2
fix_leak
fix_mistral2
fix_neox_rotary_emb
fix/op-trace-id
fix/parse-mamba-config
fix_phi3
fix-cudagraph-bug
fix-gemma-tokenization
fix-grammar-cleanup-bug
fix-grammar-fsm-batching
fix-mixtral-adapter-loading
fix-release-tests
fix-repack-for-marlin
fix-tool-call-def
fix-tp
fix-version-install
flashinfer
flashinfer-0.2.5
fp8_kvcache
fp8_rocm
gaudi_llama4_tmp
git_v2.1.0
git_v2.1.1
git_v2.2.0
git_v2.3.0
git_v2.3.1
git_v2.4.0
git_v2.4.1
git_v3.0.0
git_v3.0.1
git_v3.0.2
git_v3.1.0
git_v3.2.2
git_v3.2.3
git_v3.3.3
git_v3.3.4
git_v3.3.5
git_2.0.4
git_3.1.1
git_3.2.0
git_3.2.1
git_3.3.0
git_3.3.1
git_3.3.2
improve_defaults
improve_launcher_defaults
improve-docs
improve-dynamic-message-content
improve-json-schema-field
improve-tool-call-and-response-ids
inlcude-latest-release-on-commit-builds-tags
ipex-moe
kvrouter
kvrouter-endpoints
llama-fused-compiled-mlp
main
maintenance/docker-network
maintenance/merge-vlm-input-prep
mamba2
martinigoyanes-fix-frequency-penalty
medusa
megatron
message-more-info
mi300-temp
mllama
model_compat_log
more_logs
multi-lora
new_minor_version
nix/cargo-clippy
nix/docker2
nix_integration_tests
nix/pytorch-2.5.1
nix_test2
no_root_user
no_root_user2
op-compilation-benchmarking
origin/slind_window_fix
osanseviero-patch-1
pip-installable
pr-1869-ci-run
pr-2076-ci-run
pr-2290-ci-runner
pr-2366-ci-branch
pr-2444-ci-branch
pr-2517-ci-branch
pr-2711-ci-branch
pr-2784-ci-branch
pr-2840-ci-branch
pr-2954-ci-branch
pr-3002-ci-branch
pr-3004-ci-branch
pr-3018-ci-branch
precompile-kernels-workflow
prefix_chunk
prefix_default
proxy_sse_engine_state
quantization
quantization-0.1
refactor-lora-linear
release-3.2.4
remove_post_load_weights
response-header-metrics
revert
rocm_6.2_fixes
rocm-ci-build
router-grammar-compile
s3-cache
self-generating-docs
set-num-blocks
simpler_exllama
skip-mistral-test
speculative
streaming_conceptual
support-granite-vision
support-logit-bias-in-chat
support-phi3-small
support-phi-model
support-pre-compile-kernels
temp_work
test_docs
test_rocm
test-batch-speedup-amount
tmp_invariants
tmp_medusa
tmp_torch_compile
transformers-ci
triton_fix
trtllm/executor_stats
trtllm-stop-words
tuna
update_docs2
update_internal_version
update_peft
update_readme
update-flake-deps-and-logit-processor
update-jsonschema
upgrade_mlp_speculator
upgrade-outlines
use_g6
use_updated_kernels
vllm/setup
zstd
feat: add OpenAssistant/oasst-sft-1-pythia-12b to the list of supported models (#122)
OlivierDehaene
committed
2 years ago
Verified
6860ce9c
v0.4.0 (#119)
OlivierDehaene
committed
2 years ago
Verified
411d6247
feat(python-client): add new parameters (#118)
OlivierDehaene
committed
2 years ago
Verified
d8dc8f1b
feat(router): add best_of parameter (#117)
OlivierDehaene
committed
2 years ago
Verified
55bd4fed
feat(router): support left truncation (#115)
OlivierDehaene
committed
2 years ago
Verified
e8bfe199
fix(server): do not warp prefill logits (#116)
OlivierDehaene
committed
2 years ago
Verified
c0795de2
feat: support typical sampling (#114)
OlivierDehaene
committed
2 years ago
Verified
1a2d6825
fix(server): fix index out of range for watermarking (#110)
OlivierDehaene
committed
2 years ago
Verified
941cd42e
fix(python-client): stream not set on the sync client (#109)
OlivierDehaene
committed
2 years ago
Verified
2c5df5d2
feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible (#108)
OlivierDehaene
committed
2 years ago
Verified
5fd2dcb5
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107)
OlivierDehaene
committed
2 years ago
Verified
0ac38d33
fix(server): fix galactica batch (#106)
OlivierDehaene
committed
2 years ago
Verified
b1485e18
feat(clients): Python client (#103)
OlivierDehaene
committed
2 years ago
Verified
3fef90d5
feat: add supported models (#102)
OlivierDehaene
committed
2 years ago
Verified
0e9ed1a8
feat: allow local models (#101)
OlivierDehaene
committed
2 years ago
Verified
cd5961b5
fix(server): fix generate_stream by forcing tokens to be decoded correctly (#100)
OlivierDehaene
committed
2 years ago
Verified
9b205d33
v0.3.2 (#97)
OlivierDehaene
committed
2 years ago
Verified
1c19b093
feat(server): fix transformers commit (#96)
OlivierDehaene
committed
2 years ago
Verified
0b6807ca
fix(launcher): add router parameters to launcher (#95)
OlivierDehaene
committed
2 years ago
Verified
240c4187
feat(ci): improve CI speed (#94)
OlivierDehaene
committed
2 years ago
Verified
e3ded361
feat(server): update to hf_transfer==0.1.2 (#93)
OlivierDehaene
committed
2 years ago
Verified
2d39f199
feat(server): add logits watermark (#90)
OlivierDehaene
committed
2 years ago
Verified
9b8ea6a6
feat(router): add api-inference headers (#91)
OlivierDehaene
committed
2 years ago
Verified
f874c478
feat(router): ask hf.co for pipelinetag to decide on compat_return_full_text (#89)
OlivierDehaene
committed
2 years ago
Verified
4e685d90
feat(router): add legacy route for api-inference support (#88)
OlivierDehaene
committed
2 years ago
Verified
21340f24
fix(server): fix token_is_special (#87)
OlivierDehaene
committed
2 years ago
Verified
65e2f162
fix(docs): fix openapi schema (#86)
OlivierDehaene
committed
2 years ago
Verified
3b03c4ea
feat(server): add special token bool (#85)
OlivierDehaene
committed
2 years ago
Verified
0ac184ce
v0.3.1 (#84)
OlivierDehaene
committed
2 years ago
Verified
4b1c9720
feat(server): pre-allocate max attention mask (#75)
OlivierDehaene
committed
2 years ago
Verified
44ce098c
Newer
Older