huggingface/text-generation-inference

Pull Requests Commits

fix: rerun black lint

drbh committed 1 year ago

130f9d16

fix: limit some tests to only run when cuda available

drbh committed 1 year ago

301a18c2

feat: add test for positional rotary embeddings

System administrator committed 1 year ago

040c5b59

CI (2599): Update ToolType input schema (#2601)

drbh committed 1 year ago

Verified 8ad20daf

nix: move back to the tgi-nix main branch (#2620)

danieldk committed 1 year ago

Verified 6db3bcb7

Add support for fused MoE Marlin for AWQ (#2616)

danieldk committed 1 year ago

Verified 64142489

Upgrade minor rust version (Fixes rust build compilation cache) (#2617)

Narsil committed 1 year ago

Verified 8b295aa4

enable mllama in intel platform (#2610)

sywangyi committed 1 year ago

Verified 57f9685d

Fix FP8 KV-cache condition (#2611)

flozi00 committed 1 year ago

Verified 0da4df4b

Add basic FP8 KV cache support (#2603)

danieldk committed 1 year ago

Verified 2358c2bb

nix: example of local package overrides during development (#2607)

danieldk committed 1 year ago

Verified 68103079

Revert "Unroll notify error into generate response" (#2605)

drbh committed 1 year ago

Verified 3011639f

New release 2.3.1 (#2604)

Narsil committed 1 year ago

Verified f6e2f05b

Unroll notify error into generate response (#2597)

drbh committed 1 year ago

Verified d22b0c1f

CI (2592): Allow LoRA adapter revision in server launcher (#2602)

drbh committed 1 year ago

Verified 23354595

Max token capacity metric (#2595)

Narsil committed 1 year ago

Verified 0204946d

Mllama flash version (#2585)

Narsil committed 1 year ago

Verified d18ed5cf

nix: experimental support for building a Docker container (#2470)

danieldk committed 1 year ago

Verified 584b4d7a

MoE Marlin: support `desc_act` for `groupsize != -1` (#2590)

danieldk committed 1 year ago

Verified 1c84a30f

Move flake back to tgi-nix `main` (#2586)

danieldk committed 1 year ago

Verified d1f257ac

feat: support phi3.5 moe (#2479)

drbh committed 1 year ago

Verified 93a7042d

Add support for GPTQ-quantized MoE models using MoE Marlin (#2557)

danieldk committed 1 year ago

Verified 90a1d04a

Update ROCM libs and improvements (#2579)

mht-sharma committed 1 year ago

Verified f9e561ec

Update architecture.md (#2577)

ulhaqi12 committed 1 year ago

Verified e790cfc0

Remove compute capability lazy cell (#2580)

danieldk committed 1 year ago

Verified afc7ded8

flashinfer: pass window size and dtype (#2574)

danieldk committed 1 year ago

Verified 1028996f

Improve support for GPUs with capability < 8 (#2575)

danieldk committed 1 year ago

Verified 5b6b74e2

Fix build with `--features google` (#2566)

alvarobartt committed 1 year ago

Verified 0aa66d69

Add LoRA adapters support for Gemma2 (#2567)

alvarobartt committed 1 year ago

Verified 0b7df771

remove LORA_ADAPTERS_PATH (#2563)

nbroad1881 committed 1 year ago

Verified 7efcb5e0

Older