vllm
[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger
#17331
Merged

[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger #17331

rasmith
rasmith Add VLLM_ROCM_USE_FP8_SCALES flag
b9f9f81d
github-actions
rasmith lint
9048aa55
ProExpertProg
ProExpertProg
rasmith
rasmith Merge branch 'vllm-project:main' into rasmith_add_vllm_use_rocm_fp8_s…
98705adf
mergify
mergify mergify added needs-rebase
rasmith Use vllm config instead of env variable for fp8 scales option
2f31d6b1
mergify mergify removed needs-rebase
rasmith rasmith changed the title [AMD] [Quantization] Add VLLM_ROCM_USE_FP8_SCALES flag [AMD] [Quantization] Add flag for using fp8 scales instead of using kv_cache_dtype trigger 264 days ago
rasmith
ProExpertProg
ProExpertProg commented on 2025-05-09
rasmith Merge branch 'vllm-project:main' into rasmith_add_vllm_use_rocm_fp8_s…
bf8166cd
rasmith use override instead
fdc428ba
rasmith format
44b18cec
rasmith remove was_raised from set_current_vllm_config
1bc79b7e
rasmith remove was_raised
5cec76f3
rasmith
rasmith rasmith changed the title [AMD] [Quantization] Add flag for using fp8 scales instead of using kv_cache_dtype trigger [AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger 251 days ago
ProExpertProg
ProExpertProg commented on 2025-05-22
rasmith simplify and add warning
2c5ffb08
rasmith set stacklevel for warning
e7400c16
rasmith fix typo
e135f78c
ProExpertProg
ProExpertProg commented on 2025-05-30
rasmith Merge branch 'vllm-project:main' into rasmith_add_vllm_use_rocm_fp8_s…
4c6244bf
rasmith check if kv cache is fp8
7ad4a103
rasmith check if kv cache is fp8
85ccf7c6
ProExpertProg
ProExpertProg approved these changes on 2025-06-07
gshtras gshtras added ready
mergify mergify added rocm
gshtras gshtras merged c7ea0b56 into main 231 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone