Warmup gaudi backend #3172

sywangyi
sywangyi clean cuda/rocm code in hpu backend, enable flat_hpu
201dc629
sywangyi fix TP in pageattn
b7fea6fc
sywangyi adjust block table in hpu to improve performance
5d365394
sywangyi enable all the model. not testet yet
a07e7437
sywangyi use tensor cache in hpu graph to avoid replay issue
6bbe24d9
sywangyi add moe support, fix qwen/mistral/mixtral crash
5cd1c93c
sywangyi fix phimoe issue
073f7939
sywangyi gpt_bigcode could also go pageattn
2cde30de
sywangyi enable dbrx remove some unused code
2074d051
sywangyi Merge branch 'main' into gaudi_backend_pa
d5b78ba1
sywangyi multi-modality initial PR
f95aa426
sywangyi adjust warmup and enable vlm
36b6612f
sywangyi fix incorrect output in qwen2 idefics if hpu graph is used
fdf0733f
sywangyi remove unused quantization code and enable awq/gptq int4
9914ffe1
sywangyi fix gptq issue
8d221b7b
sywangyi enable fp8
69773767
sywangyi warmup prefill
fd70ad70
sywangyi add warmup_decode
ba7a131e
sywangyi warmup decode
7900be5a
sywangyi remove block_tables and prefill_cache_indices which will lead to dyna…
1508ee8d
sywangyi Merge branch 'main' into gaudi_backend_pa
7914e980
sywangyi fix comment
787dbe98
sywangyi missing gptj change...
376e0507
sywangyi fix some issue
f0e5faec
sywangyi remove torch.where to fix incorrect output in hpu graph model
c55a8cae
sywangyi LLM warmup logic
9d85ac94
sywangyi multi-modality warmup
705cc0b6
sywangyi optimize code
a84da5b6
sywangyi refine log and fix some issue
85916875
sywangyi fix warmup issue for mllama
29703dbd
sywangyi pingpong optimization
cd900c3b
sywangyi Merge branch 'main' into gaudi_backend_pa
610dd200
sywangyi match the latest vllm_extension ops
4cdc34ec
sywangyi Merge branch 'gaudi_backend_pa' into warmup_gaudi_backend
4de8fb01
sywangyi work with the latest vllm extension ops
a83e9fe0
sywangyi remove block_scales which is not needed anymore
76cc1297
sywangyi improve performance
ba049c9d
sywangyi Merge branch 'main' into warmup_gaudi_backend
6b21985c
sywangyi prefill bypass graph
5ec7f15d
sywangyi pingpong optimization issue fix
bf3987e2
sywangyi Merge branch 'main' into warmup_gaudi_backend
01f17d52
sywangyi
regisss
regisss approved these changes on 2025-04-18
Narsil
Narsil approved these changes on 2025-04-24
Narsil Narsil merged 37580294 into main 239 days ago
sywangyi sywangyi deleted the warmup_gaudi_backend branch 205 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone