text-generation-inference
Gaudi: clean cuda/rocm code in hpu backend, enable flat_hpu
#3113
Merged

Gaudi: clean cuda/rocm code in hpu backend, enable flat_hpu #3113

Narsil merged 27 commits into huggingface:main from sywangyi:gaudi_backend_pa
sywangyi
sywangyi clean cuda/rocm code in hpu backend, enable flat_hpu
201dc629
sywangyi fix TP in pageattn
b7fea6fc
sywangyi adjust block table in hpu to improve performance
5d365394
sywangyi enable all the model. not testet yet
a07e7437
sywangyi use tensor cache in hpu graph to avoid replay issue
6bbe24d9
baptistecolle
baptistecolle
baptistecolle commented on 2025-03-17
baptistecolle
baptistecolle commented on 2025-03-17
baptistecolle
baptistecolle
baptistecolle commented on 2025-03-17
sywangyi add moe support, fix qwen/mistral/mixtral crash
5cd1c93c
baptistecolle
sywangyi fix phimoe issue
073f7939
sywangyi gpt_bigcode could also go pageattn
2cde30de
sywangyi enable dbrx remove some unused code
2074d051
sywangyi Merge branch 'main' into gaudi_backend_pa
d5b78ba1
sywangyi multi-modality initial PR
f95aa426
sywangyi adjust warmup and enable vlm
36b6612f
sywangyi fix incorrect output in qwen2 idefics if hpu graph is used
fdf0733f
sywangyi remove unused quantization code and enable awq/gptq int4
9914ffe1
sywangyi fix gptq issue
8d221b7b
sywangyi enable fp8
69773767
sywangyi warmup prefill
fd70ad70
sywangyi add warmup_decode
ba7a131e
sywangyi warmup decode
7900be5a
sywangyi remove block_tables and prefill_cache_indices which will lead to dyna…
1508ee8d
sywangyi Merge branch 'main' into gaudi_backend_pa
7914e980
sywangyi fix comment
787dbe98
sywangyi missing gptj change...
376e0507
sywangyi fix some issue
f0e5faec
sywangyi sywangyi marked this pull request as ready for review 268 days ago
sywangyi remove torch.where to fix incorrect output in hpu graph model
c55a8cae
baptistecolle
baptistecolle commented on 2025-04-02
baptistecolle baptistecolle requested a review from Narsil Narsil 263 days ago
baptistecolle baptistecolle requested a review from regisss regisss 263 days ago
baptistecolle baptistecolle added gaudi
baptistecolle baptistecolle changed the title clean cuda/rocm code in hpu backend, enable flat_hpu Gaudi: clean cuda/rocm code in hpu backend, enable flat_hpu 263 days ago
sywangyi
sywangyi Merge branch 'main' into gaudi_backend_pa
610dd200
sywangyi match the latest vllm_extension ops
4cdc34ec
sywangyi
sywangyi
regisss
regisss approved these changes on 2025-04-14
Narsil Narsil merged d62c941c into main 251 days ago
sywangyi sywangyi deleted the gaudi_backend_pa branch 207 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone