vllm
[Core] Default to using per_token quantization for fp8 when cutlass is supported.
#8651
Merged

[Core] Default to using per_token quantization for fp8 when cutlass is supported. #8651

mgoin merged 4 commits into vllm-project:main from elfiegg:per_token
elfiegg
elfiegg Default to use per_token quantization for fp8 when cutlass is supported.
4ffde1b6
github-actions
mgoin
mgoin approved these changes on 2025-01-15
mgoin Update vllm/model_executor/layers/quantization/fp8.py
987c825d
mgoin Merge branch 'main' into per_token
d3390218
mgoin Format
c0f22834
mgoin mgoin added ready
mgoin mgoin enabled auto-merge (squash) 338 days ago
mgoin mgoin merged fa0050db into main 338 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone