vllm
[Core] Default to using per_token quantization for fp8 when cutlass is supported.
#8651

Merged

[Core] Default to using per_token quantization for fp8 when cutlass is supported. #8651

mgoin merged 4 commits into vllm-project:main from elfiegg:per_token

Default to use per_token quantization for fp8 when cutlass is supported.

4ffde1b6

mgoin approved these changes on 2025-01-15

Update vllm/model_executor/layers/quantization/fp8.py

987c825d

Merge branch 'main' into per_token

d3390218

Format

c0f22834

mgoin added ready

mgoin enabled auto-merge (squash) 338 days ago

mgoin merged fa0050db into main 338 days ago

Reviewers

mgoin

Assignees

No one assigned

Labels

ready

Milestone

No milestone