[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger #17331
Add VLLM_ROCM_USE_FP8_SCALES flag
b9f9f81d
lint
9048aa55
Merge branch 'vllm-project:main' into rasmith_add_vllm_use_rocm_fp8_s…
98705adf
Use vllm config instead of env variable for fp8 scales option
2f31d6b1
rasmith
changed the title [AMD] [Quantization] Add VLLM_ROCM_USE_FP8_SCALES flag [AMD] [Quantization] Add flag for using fp8 scales instead of using kv_cache_dtype trigger 264 days ago
Merge branch 'vllm-project:main' into rasmith_add_vllm_use_rocm_fp8_s…
bf8166cd
use override instead
fdc428ba
format
44b18cec
remove was_raised from set_current_vllm_config
1bc79b7e
remove was_raised
5cec76f3
rasmith
changed the title [AMD] [Quantization] Add flag for using fp8 scales instead of using kv_cache_dtype trigger [AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger 251 days ago
simplify and add warning
2c5ffb08
set stacklevel for warning
e7400c16
fix typo
e135f78c
Merge branch 'vllm-project:main' into rasmith_add_vllm_use_rocm_fp8_s…
4c6244bf
check if kv cache is fp8
7ad4a103
check if kv cache is fp8
85ccf7c6
gshtras
merged
c7ea0b56
into main 231 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub