vllm
fa0050db - [Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651)

Commit
336 days ago
[Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Michael Goin <mgoin@redhat.com> Co-authored-by: mgoin <michael@neuralmagic.com>
Author
Parents
Loading