vllm
[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing
#12501
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
9
Changes
View On
GitHub
Commits
Add ptpc-fp8 quantization
kliuae
committed
331 days ago
Enable torch._scaled_mm rowwise gemm fp8
tjtanaa
committed
331 days ago
Update PyTorch version in Dockerfile.rocm_base; Update AMD GPU installation readme to point to ROCm6.5
tjtanaa
committed
331 days ago
add ptpc fp8 unittests
tjtanaa
committed
331 days ago
fix test_fp8.py::test_kv_cache_model_load_and_run; remove unnecessary code path; add skip test comment
tjtanaa
committed
327 days ago
Merge remote-tracking branch 'origin/main' into ptpc-fp8-rocm-2
tjtanaa
committed
325 days ago
format lint code
tjtanaa
committed
325 days ago
Merge remote-tracking branch 'origin/main' into ptpc-fp8-rocm-2
tjtanaa
committed
321 days ago
introduce USE_ROWWISE_TORCH_SCALED_MM
tjtanaa
committed
321 days ago
Loading