[Core] Default to using per_token quantization for fp8 when cutlass is supported. #8651
Default to use per_token quantization for fp8 when cutlass is supported.
4ffde1b6
mgoin
approved these changes
on 2025-01-15
Update vllm/model_executor/layers/quantization/fp8.py
987c825d
Merge branch 'main' into per_token
d3390218
Format
c0f22834
mgoin
enabled auto-merge (squash) 338 days ago
mgoin
merged
fa0050db
into main 338 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub