Warmup gaudi backend #3172
clean cuda/rocm code in hpu backend, enable flat_hpu
201dc629
fix TP in pageattn
b7fea6fc
adjust block table in hpu to improve performance
5d365394
enable all the model. not testet yet
a07e7437
use tensor cache in hpu graph to avoid replay issue
6bbe24d9
add moe support, fix qwen/mistral/mixtral crash
5cd1c93c
fix phimoe issue
073f7939
gpt_bigcode could also go pageattn
2cde30de
enable dbrx remove some unused code
2074d051
Merge branch 'main' into gaudi_backend_pa
d5b78ba1
multi-modality initial PR
f95aa426
adjust warmup and enable vlm
36b6612f
fix incorrect output in qwen2 idefics if hpu graph is used
fdf0733f
remove unused quantization code and enable awq/gptq int4
9914ffe1
fix gptq issue
8d221b7b
enable fp8
69773767
warmup prefill
fd70ad70
add warmup_decode
ba7a131e
warmup decode
7900be5a
remove block_tables and prefill_cache_indices which will lead to dyna…
1508ee8d
Merge branch 'main' into gaudi_backend_pa
7914e980
fix comment
787dbe98
missing gptj change...
376e0507
fix some issue
f0e5faec
remove torch.where to fix incorrect output in hpu graph model
c55a8cae
LLM warmup logic
9d85ac94
multi-modality warmup
705cc0b6
optimize code
a84da5b6
refine log and fix some issue
85916875
fix warmup issue for mllama
29703dbd
pingpong optimization
cd900c3b
Merge branch 'main' into gaudi_backend_pa
610dd200
match the latest vllm_extension ops
4cdc34ec
Merge branch 'gaudi_backend_pa' into warmup_gaudi_backend
4de8fb01
work with the latest vllm extension ops
a83e9fe0
remove block_scales which is not needed anymore
76cc1297
improve performance
ba049c9d
Merge branch 'main' into warmup_gaudi_backend
6b21985c
prefill bypass graph
5ec7f15d
pingpong optimization issue fix
bf3987e2
Merge branch 'main' into warmup_gaudi_backend
01f17d52
regisss
approved these changes
on 2025-04-18
Narsil
approved these changes
on 2025-04-24
Narsil
merged
37580294
into main 239 days ago
sywangyi
deleted the warmup_gaudi_backend branch 205 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub