vllm
[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing
#12501
Merged

Commits
  • Add ptpc-fp8 quantization
    kliuae committed 331 days ago
  • Enable torch._scaled_mm rowwise gemm fp8
    tjtanaa committed 331 days ago
  • Update PyTorch version in Dockerfile.rocm_base; Update AMD GPU installation readme to point to ROCm6.5
    tjtanaa committed 331 days ago
  • add ptpc fp8 unittests
    tjtanaa committed 331 days ago
  • fix test_fp8.py::test_kv_cache_model_load_and_run; remove unnecessary code path; add skip test comment
    tjtanaa committed 327 days ago
  • Merge remote-tracking branch 'origin/main' into ptpc-fp8-rocm-2
    tjtanaa committed 325 days ago
  • format lint code
    tjtanaa committed 325 days ago
  • Merge remote-tracking branch 'origin/main' into ptpc-fp8-rocm-2
    tjtanaa committed 321 days ago
  • introduce USE_ROWWISE_TORCH_SCALED_MM
    tjtanaa committed 321 days ago
Loading